Questions about new nodes and migration

Hello, I got a 20 nodes cluster before, which has reached 50% disk usage, so I add 9 new empty nodes to the cluster. But then the problems occured. The cluster has been turned into migration status for 2 weeks, and are still migrating. Here is the info for the cluster:

And here is the result I use asadm tool to excute the “show statistics like migrate” command:

Seed:        [('11.140.4.2', 3300, None)]
Config_file: None
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Service Statistics (2019-06-11 08:25:58 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                        :   a36c01206.su18.tbsite.net:3300   a36c01207.su18.aliyun.com:3300   a36c01214.su18.tbsite.net:3300   a36c01215.su18.aliyun.com:3300   a36c01218.su18.tbsite.net:3300   a36c01219.su18.aliyun.com:3300   a36c01221.su18.tbsite.net:3300   a36c01223.su18.aliyun.com:3300   a36c01226.su18.tbsite.net:3300   a36c01227.su18.aliyun.com:3300   a36c01232.su18.aliyun.com:3300   a36c01233.su18.aliyun.com:3300   a36c01234.su18.tbsite.net:3300   a36c01237.su18.aliyun.com:3300   a36c01242.su18.aliyun.com:3300   a36c01245.su18.tbsite.net:3300   a36c01246.su18.tbsite.net:3300   a36c01248.su18.tbsite.net:3300   a36c01251.su18.tbsite.net:3300   a36c01257.su18.aliyun.com:3300   e33i03088.su18.aliyun.com:3300   e33i03089.su18.aliyun.com:3300   e33i03090.su18.tbsite.net:3300   e33i03091.su18.tbsite.net:3300   e33i07092.su18.aliyun.com:3300   e33j01119.su18.tbsite.net:3300   e33j01120.su18.tbsite.net:3300   e33j03122.su18.tbsite.net:3300   e33j03123.su18.tbsite.net:3300
migrate_allowed             :   true                             true                             true                             true                             true                             true                             true                             true                             true                             true                             true                             true                             true                             true                             true                             true                             true                             true                             true                             true                             true                             true                             true                             true                             true                             true                             true                             true                             true
migrate_partitions_remaining:   728                              1941                             695                              2112                             1654                             1797                             1815                             1409                             468                              1648                             560                              1725                             630                              585                              1727                             1589                             1990                             2352                             1976                             2117                             1337                             395                              567                              1449                             1441                             1413                             1674                             1726                             1584

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~user_profile Namespace Statistics (2019-06-11 08:25:58 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                           :   a36c01206.su18.tbsite.net:3300   a36c01207.su18.aliyun.com:3300   a36c01214.su18.tbsite.net:3300   a36c01215.su18.aliyun.com:3300   a36c01218.su18.tbsite.net:3300   a36c01219.su18.aliyun.com:3300   a36c01221.su18.tbsite.net:3300   a36c01223.su18.aliyun.com:3300   a36c01226.su18.tbsite.net:3300   a36c01227.su18.aliyun.com:3300   a36c01232.su18.aliyun.com:3300   a36c01233.su18.aliyun.com:3300   a36c01234.su18.tbsite.net:3300   a36c01237.su18.aliyun.com:3300   a36c01242.su18.aliyun.com:3300   a36c01245.su18.tbsite.net:3300   a36c01246.su18.tbsite.net:3300   a36c01248.su18.tbsite.net:3300   a36c01251.su18.tbsite.net:3300   a36c01257.su18.aliyun.com:3300   e33i03088.su18.aliyun.com:3300   e33i03089.su18.aliyun.com:3300   e33i03090.su18.tbsite.net:3300   e33i03091.su18.tbsite.net:3300   e33i07092.su18.aliyun.com:3300   e33j01119.su18.tbsite.net:3300   e33j01120.su18.tbsite.net:3300   e33j03122.su18.tbsite.net:3300   e33j03123.su18.tbsite.net:3300
migrate-order                  :   5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5
migrate-retransmit-ms          :   5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000
migrate-sleep                  :   1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1
migrate_record_receives        :   16449543237                      21630512652                      23932602628                      19966563162                      18122903659                      23464187361                      21698993973                      24306710893                      15914036491                      21190433353                      24960772499                      19530400781                      7762753589                       10446397593                      20288785758                      18742123761                      17873510624                      20409481725                      21777349216                      24079589569                      2773332324                       4398684465                       3688153412                       1669491325                       1988151314                       2525268679                       2188041823                       1656102890                       1620204109
migrate_record_retransmits     :   432283                           1566932                          1092753                          1101109                          1368597                          1296223                          1337367                          826581                           276738                           562876                           964475                           1195467                          131912                           203737                           2846178                          1734876                          976368                           950160                           1378335                          942799                           20306                            78049                            11117                            5502                             7088                             15474                            2707                             15295                            17656

The aerospike server version is 3.8.3, all the config of the nodes are the same, the machines are also the same.
Actually I got another cluster in another DC, which is a mutual backup to this cluster. I add the same number new nodes to that cluster either, and that cluster has finished migration in one week, and became balanced status. But this cluster seens far away from balanced status.
So I want to know what happened to this cluster? How can I fix the problem?

Here is the basic info for the cluster

Here is the monitor for the “Objects to be transmited”

The first arrow points to the stats of backup cluster as I mentioned above. When I add 5 new nodes to the backup cluster. The second arrow is when I add another 4 new nodes to the cluster. After the two operations, “Objects to be transmited” both grew and then dropped very fast.
The third arrow points to the cluster that I ask questions for, when I add 9 new nodes at one time. The “Objects to be transmited” grew up and dropped very slow, quite different from the backup cluster. And actually the backup cluster has higher load than this cluster.
Is it because I add 9 new nodes at one time? But I don’t see a limitation about the number of new nodes in the documents.

What are migrate threads and migrate max num incoming set to? Do you have any entries in the log saying anything?

How are you computing “Objects to be transmitted”?

Also be sure to watch this short 24 min introduction to Aerospike video: https://www.youtube.com/watch?v=PA7PGWphW8M.

All the migrate setting are default.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Service Configuration (2019-06-12 02:06:38 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                    :   h52d03159.eu95.tbsite.net:3300   h52d12322.eu95.tbsite.net:3300   h53b10329.eu95.tbsite.net:3300   h53c06125.eu95.tbsite.net:3300   h53d07084.eu95.tbsite.net:3300   h53e01298.eu95.tbsite.net:3300   h53e01322.eu95.tbsite.net:3300   h53e01323.eu95.tbsite.net:3300   h53e01324.eu95.tbsite.net:3300   h53e01325.eu95.tbsite.net:3300   h53e01326.eu95.tbsite.net:3300   h53e04175.eu95.tbsite.net:3300   h53e04180.eu95.tbsite.net:3300   h53e04183.eu95.tbsite.net:3300   h53e04199.eu95.tbsite.net:3300   h53e04200.eu95.tbsite.net:3300   h53e04201.eu95.tbsite.net:3300   h53e04202.eu95.tbsite.net:3300   h53e04203.eu95.tbsite.net:3300   h53e07216.eu95.tbsite.net:3300   h53e07217.eu95.tbsite.net:3300   h53e07218.eu95.tbsite.net:3300   h53e10203.eu95.tbsite.net:3300   h53e10204.eu95.tbsite.net:3300   h53e13257.eu95.tbsite.net:3300   h53i02205.eu95.tbsite.net:3300   h53i05334.eu95.tbsite.net:3300   h78a11266.eu95.tbsite.net:3300   h78a11267.eu95.tbsite.net:3300   
migrate-max-num-incoming:   4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                4                                
migrate-threads         :   1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~user_profile Namespace Configuration (2019-06-12 02:06:38 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                 :   h52d03159.eu95.tbsite.net:3300   h52d12322.eu95.tbsite.net:3300   h53b10329.eu95.tbsite.net:3300   h53c06125.eu95.tbsite.net:3300   h53d07084.eu95.tbsite.net:3300   h53e01298.eu95.tbsite.net:3300   h53e01322.eu95.tbsite.net:3300   h53e01323.eu95.tbsite.net:3300   h53e01324.eu95.tbsite.net:3300   h53e01325.eu95.tbsite.net:3300   h53e01326.eu95.tbsite.net:3300   h53e04175.eu95.tbsite.net:3300   h53e04180.eu95.tbsite.net:3300   h53e04183.eu95.tbsite.net:3300   h53e04199.eu95.tbsite.net:3300   h53e04200.eu95.tbsite.net:3300   h53e04201.eu95.tbsite.net:3300   h53e04202.eu95.tbsite.net:3300   h53e04203.eu95.tbsite.net:3300   h53e07216.eu95.tbsite.net:3300   h53e07217.eu95.tbsite.net:3300   h53e07218.eu95.tbsite.net:3300   h53e10203.eu95.tbsite.net:3300   h53e10204.eu95.tbsite.net:3300   h53e13257.eu95.tbsite.net:3300   h53i02205.eu95.tbsite.net:3300   h53i05334.eu95.tbsite.net:3300   h78a11266.eu95.tbsite.net:3300   h78a11267.eu95.tbsite.net:3300   
migrate-order        :   5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                5                                
migrate-retransmit-ms:   5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             5000                             
migrate-sleep        :   1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1                                1

And I’m not sure what kind of log should I pay attention to.

I don’t know how the monitor computes the “Objects to be transmitted”, The monitor is buildt by my former colleague, using a open source component call “Grafana”.
I’ve read the basic priciples about aerospike, the documents say that one namespace is mapped into 4096 partitions. However this seens not correct according to the stats.
There is only one namespace in this cluster, the “show statistics like migrate” command shows that the sum of “migrate_partitions_remaining” of each node is much more than 4096. Is there anything I misunderstand?
The tx/rx number is also confusing. The 9 new nodes have an average tx of 130, an average rx of 1.4K. The new nodes are added at the same time. As I comprehended, the new nodes should have a big num of rx, but the tx should be 0. Why would the cluster transmits data from old nodes to new nodes, and then transmits data from new node again?
And to some old nodes, their disk usage have pass 50%, why would they have a higher rx num than tx num? I really can’t figure out how the auto-balancing works.

There are only 4096 partitions, but there can be more than 4096 migrations. Worst case for replication-factor=2 is 4096 * N_nodes if you consider the case of recovery from an N way network split.

The new protocol in 3.13+ behaves closer to what you describe. The older protocol advanced a returning/eventual master to master right away. The new algorithm tries to choose a node that will generate less duplicate resolution traffic (basically the node that was master continues as acting master until the returning/eventual master fills.

Adding one node would behave as you describe in the new protocol, but when you add more then 1 (which is totally ok in the new protocol but potentially unsafe in the older version you are running) the eventual master fills first and then it replicates to the new replicas (so it will have tx for where it is eventual master and rx where it is taking over a replica position).

In the Enterprise 4.x we have a feature called “uniform-balance” that evenly distributes partitions to all nodes. The default balance randomly distributes partitions and you can get unlucky.

I have observed for one more day, and I find the “Pending Migrates” number rolls over and over again. It changes in a range of 22K to 19K, but never reduce lower than 19K. The first screenshot took at last day, shows that the tx/rx number is 21.221K, here is the screenshot I just took, the tx/rx number increased.

I think adding 9 new nodes at same time may trigger some bugs, makes the cluster successively calculates the distribution of partitions, and in each period, the cluster gets a different distribution result, and adjust the migrate plan again and again, which makes the tx/rx number can’t reduces to 0.

Here is another cluster with 25 nodes originally, and then I add 4 new nodes to the cluster at the same time, the migration works perfectly, the 4 new nodes have a high rx number, the other old nodes have only tx number, none rx.

This is what I expect, there are only data move out from old nodes. I’m not sure about the details of “random distribution” as you mentioned, but this cluster works like it’s using consistent-hashing to calculate the partition distribution, thus data only transmit from old nodes to new nodes.

Back to the question, I will observe for another day. But I’m afraid that the tx/rx num would still stay higher than 19K. So is there any way to make the migration become stable? Or can it be stopped?

Oh, I’ve found a problem, the log shows that the number of nodes keep changing.

Jun 11 2019 00:37:10 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 28
Jun 11 2019 00:37:22 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 29
Jun 11 2019 01:11:14 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 28
Jun 11 2019 01:11:31 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 29
Jun 11 2019 05:53:46 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 28
Jun 11 2019 05:53:55 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 29
Jun 11 2019 06:07:27 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 28
Jun 11 2019 06:07:31 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 29
Jun 11 2019 06:21:20 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 28
Jun 11 2019 06:21:26 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 29
Jun 11 2019 06:51:20 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 28
Jun 11 2019 06:51:31 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 29
Jun 11 2019 08:49:14 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 28
Jun 11 2019 08:49:20 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 29
Jun 11 2019 08:49:30 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 28
Jun 11 2019 08:49:46 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 29
Jun 11 2019 12:23:47 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 28
Jun 11 2019 12:24:15 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 29
Jun 11 2019 13:42:30 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 28
Jun 11 2019 13:42:44 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 29
Jun 11 2019 14:19:06 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 28
Jun 11 2019 14:19:24 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 29
Jun 11 2019 19:17:02 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 28
Jun 11 2019 19:17:12 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 29
Jun 11 2019 19:46:25 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 28
Jun 11 2019 19:46:30 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 29
Jun 11 2019 19:55:58 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 28
Jun 11 2019 19:56:11 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 29
Jun 11 2019 22:19:57 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 28
Jun 11 2019 22:20:26 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 29
Jun 11 2019 23:04:19 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 28
Jun 11 2019 23:04:25 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 29
Jun 11 2019 23:04:47 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 28
Jun 11 2019 23:04:52 GMT+0800: INFO (clustering): (clustering.c:5807) applied cluster size 29

So this should be the cause of successively migrating. But the log doesn’t show which node is lost or which node is newly added. I would try to locate that problematic node.

It does use a deterministically random distribution. This is done by hashing each node_id with the partition_id, sorts hashs to determine the node sequence for a particular partition.

The problem node will likely report 1. You can also put the paxos logging context into detail mode which may tell which node_id is leaving (the newer versions will since 3.13, don’t remember if older versions also reported this).

Wait, clustering.c didn’t exist in 3.8.3, you must have upgraded to 3.13+. So instead of paxos, you will want to set clustering to detail to see which node_id is having problems.

Also what are your heartbeat settings and have they been configured the same on all node?

Yes, my mistakes not to check the cluster version. I’ve several clusters, some of them are version 3.8.3, but this one and its backup cluster is “Aerospike Community Edition build 3.16.0.1”.
The heartbeat setting is

mode mesh
interval 250
timeout 40

each node has same config.
I have set the clustering log level to detail, and wait for the lost node occurred.

Those setting are really high. This means it takes ~10 seconds for the cluster to respond to a disruption - so if a node were to suddenly drop from the cluster (maintenance or otherwise) there will be a ~10 second window where writes that would have gone to this node will fail. Typically the interval is set to 150 and timeout is 10 which gives a 1.5 second window - often in cloud environments timeout is increased to 20. I can only assume your predecessor was experiencing problems with the default heartbeat configuration and increased it - the problem your predecessor was experiencing may be related to the node you are currently tracking down.

edit: meant ~10 seconds, previously I wronte 20 seconds.

The problem is more severe. I thought the cluster lost a certain node, but actually it’s losing random nodes.
The log shows that each time the cluster lost a different node.

The adding new node operation may lead to a vicious circle. When adding new nodes, the load of the cluster increase, and some nodes become unavailable for a short time, which will trigger the “node depart” event, thus the cluster rearranges the migration plan. However, more migration tasks would lead to higher load, and then nodes might become unavailable more frequently, and then get into a vicious circle.
The essence of the problem should be that the cluster has originally reached a critical load, that’s why I wanted to add new nodes to it, and then triggered the problem.
I have adjusted the timeout param to an even bigger number, which makes the cluster will no more loses nodes. This could cause data write error, but breaking the vicious circle is more important.
After I adjusted the timeout param, the monitor shows that “Object to be transmitted” stop increasing, and become decreasing.

Just from the symptoms, I think you have an issue with your network or network interface card settings and you are trying to compensate it by adjusting Aerospike configuration to sub-optimal.

BTW, I’m still curious how “Objects to be transmitted” is being computed, I would like to know what I am actually looking at :slight_smile:.

The only way I can imagine for deriving a semi-accurate “Object to be transmitted” would be to use the “partition-info” command.

It simply counts the “Non-Replica” objects number.

According to the documents, this stat means objects beyond master and replica, only exists during migrations. Thus in a sense it could be equivalent of objects to be transmitted, or objects being transmitting.

I’m still trying to locate the problem. The system monitors of a single node show that when in the non-response period, the node has a peak disk read, its load-1-min rose over 65, whereas it only has 64 cpu cores, which means the node was fully working at that moment, and that could cause no response to the heartbeats.

In the recent 2 days, this cluster is still migrating, the monitors show that the machine has intermittent peak of disk read, which makes the load usage rises over 100%. I’m not sure why the default config of migration could cause such high disk read, but if it does, aerospike might need an optimization for migration.