Speed up re-joining a cluster

ricky.kwan.ix · January 16, 2020, 4:11pm

Hi,

When doing a rolling restart, the restarted node comes up and it can take between 10-15 minutes for it to rejoin the cluster.

Is it normal for it to take this long to rejoin the cluster? How can I improve the time?

I am running enterprise (4.5.0.5, rolling upgrade to 4.8.0.2) with clusters ranging between 10 and 16 nodes.

kporter · January 16, 2020, 7:50pm

Does your deployment use Secondary Indexes?

ricky.kwan.ix · January 16, 2020, 9:18pm

Likely not. Is this something I can see with aql show indexes? If so, then the answer is no.

kporter · January 16, 2020, 9:46pm

Do you have any persisted namespaces with data-in-memory true?

If so, these namespaces can take longer to load since they have to load the data into RAM. We have made this a bit faster with “Cool Restart” but it is still slower than a “Fast Restart” of a persistence only namespace. Both “Cool” and “Fast” restarts require that the shared memory index is available (i.e. a clean shutdown and the machine hasn’t been rebooted.)

If you have rebooted the machines then they will need to “Cold Restart” which must rebuild the primary index from disk and if data-in-memory, load the data into memory.

If neither of these apply, could you share your configuration? You may want to reach out to your enterprise support contact to ensure a timely response.

lucien · January 16, 2020, 10:15pm

During the rolling restart, can you confirm that the node was gracefully shutdown. The last line prior to the server restart should have been

finished clean shutdown - exiting

Otherwise, a coldstart would occur due to the ungraceful shutdown.

ricky.kwan.ix · January 23, 2020, 7:29pm

Aerospike Support helped me. Fabric didn’t have the address config set, so it was listening on multiple interfaces. Once I configured it so that it only listened on the default interface, it takes now only 1 min to rejoin a cluster.

Incidentally in the Configuration Reference, I don’t see address with context network and subcontext fabric. Should I?

system · January 29, 2020, 7:29pm

This topic was automatically closed 6 days after the last reply. New replies are no longer allowed.

meher · January 31, 2020, 3:08am

I think you should indeed… let us address that. Thanks for pointing it out.

Topic		Replies	Views
Can we change the time one node take to join cluster after restart? Monitoring	5	771	June 3, 2022
Cluster upgrade	7	1248	May 29, 2017
Reboot 2 node cluster	5	815	March 14, 2019
Performance Degrades after restart and during migration Tuning	2	1319	August 16, 2014
Will Data Recover on the Other Cluster or on the Local HDD? How Aerospike Works	6	2353	August 3, 2015

Speed up re-joining a cluster

Related topics