Can we change the time one node take to join cluster after restart?

binhpn · June 1, 2022, 2:41pm

Dear, I’m new to Aerospike so please help to explain:

how one node join to cluster after restart
how much time normally it takes to form new cluster, and it depends on which parameter
How to monitor the process join cluster.

Thanks in advance

kporter · June 1, 2022, 4:59pm

I assume you are using Aerospike Community Edition which doesn’t include the “Fast Restart” feature provided by the Enterprise Edition.

Basically, when a Community node restarts, it will have to rebuild the in-memory primary indexes from storage. This process can take a significant amount of time depending on the amount of data. Aerospike EE maintains the primary indexes in shared memory and can re-attach the shared memory indexes when it comes back up. This feature significantly speeds up the time to restart a node.

binhpn · June 2, 2022, 1:07am

Thanks @kporter for your quick response. Yes, I’m using Community Edition so must cold restart. But my question about time to form a new cluster (not count rebalance + re-index time). As in my log file below, it took about 2 minutes to form new cluster (from 14:24:28 to 14:26:28) Jun 01 2022 14:24:08 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 3 Jun 01 2022 14:24:28 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:24:38 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:24:48 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:24:58 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:25:08 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:25:18 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:25:28 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:25:38 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:25:48 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:25:58 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:26:08 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:26:18 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:26:28 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 3

Sometime i took more than 2 minutes (about 7 minutes), so I want to know the progress of this process, how to optimize and how to monitor?

kporter · June 2, 2022, 5:35pm

That doesn’t seem normal. The time to discover a new node or the removal of a node is roughly heartbeat.timeout * heartbeat.interval ms (or 1.5s by default). Such events trigger the clustering module which will eventually trigger the exchange module. The clustering and exchange algorithms require communication with the entire cluster, so they are dependent on the network for performance, but, in a typical deployment, the total time wouldn’t exceed a few seconds.

I’d suspect either an atypical heartbeat configuration change or some network problems (such as high latency). You could enable detail logging for the clustering and exchange modules which could identify network issues when it logs retransmits. You may also use the health-outliers to identify problematic nodes.

binhpn · June 3, 2022, 1:55am

Thanks @kporter , I tested all nodes today and just took some seconds to join cluster. Maybe network problem sometimes. I will check follow your suggestion if it happend again!

system · August 26, 2022, 1:56am

This topic was automatically closed 84 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Will Data Recover on the Other Cluster or on the Local HDD? How Aerospike Works	6	2353	August 3, 2015
Speed up re-joining a cluster Operations	7	820	January 31, 2020
Cluster upgrade	7	1248	May 29, 2017
What is the delay between node dies and rebalancing process occurs? How Developers Are Using Aerospike	8	3372	January 3, 2018
Should I restart the nodes when adding a new node to the cluster? Configuration	4	2572	June 3, 2015

Can we change the time one node take to join cluster after restart?

Related topics