Can we change the time one node take to join cluster after restart?

Dear, I’m new to Aerospike so please help to explain:

  1. how one node join to cluster after restart
  2. how much time normally it takes to form new cluster, and it depends on which parameter
  3. How to monitor the process join cluster.

Thanks in advance

I assume you are using Aerospike Community Edition which doesn’t include the “Fast Restart” feature provided by the Enterprise Edition.

Basically, when a Community node restarts, it will have to rebuild the in-memory primary indexes from storage. This process can take a significant amount of time depending on the amount of data. Aerospike EE maintains the primary indexes in shared memory and can re-attach the shared memory indexes when it comes back up. This feature significantly speeds up the time to restart a node.

Thanks @kporter for your quick response. Yes, I’m using Community Edition so must cold restart. But my question about time to form a new cluster (not count rebalance + re-index time). As in my log file below, it took about 2 minutes to form new cluster (from 14:24:28 to 14:26:28) Jun 01 2022 14:24:08 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 3 Jun 01 2022 14:24:28 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:24:38 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:24:48 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:24:58 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:25:08 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:25:18 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:25:28 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:25:38 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:25:48 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:25:58 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:26:08 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:26:18 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 0 Jun 01 2022 14:26:28 GMT: INFO (info): (ticker.c:164) NODE-ID bb957f2e03e16fa CLUSTER-SIZE 3

Sometime i took more than 2 minutes (about 7 minutes), so I want to know the progress of this process, how to optimize and how to monitor?

That doesn’t seem normal. The time to discover a new node or the removal of a node is roughly heartbeat.timeout * heartbeat.interval ms (or 1.5s by default). Such events trigger the clustering module which will eventually trigger the exchange module. The clustering and exchange algorithms require communication with the entire cluster, so they are dependent on the network for performance, but, in a typical deployment, the total time wouldn’t exceed a few seconds.

I’d suspect either an atypical heartbeat configuration change or some network problems (such as high latency). You could enable detail logging for the clustering and exchange modules which could identify network issues when it logs retransmits. You may also use the health-outliers to identify problematic nodes.

1 Like

Thanks @kporter , I tested all nodes today and just took some seconds to join cluster. Maybe network problem sometimes. I will check follow your suggestion if it happend again!