Is there a problem if I add 5 nodes in one shot?


#1

I already have 9Node in a cluster and I would like to add +5 nodes to it. Is it a problem if I bring up all 5 nodes to join the cluster at once? Or should I join one by one and let the migrates to complete on one and then start next one and so on until 5.

I understand shuttingdown more than 1node at once has can have data loss. How about incase of adding nodes all 5 at once, from theory point of view it shouldn’t, but is there any disclaimer that I should be aware of ?

Regards, Mannoj Kumar


#2

What is your server version?


#3

aerospike-server-community 3.15.1.3-1


#4

Add all 5 nodes at once, trust these don’t have pre-existing/old Aerospike data on their SSD drives.


#5

Those are fresh 5 nodes with no data. During that time we can traffic going in on earlier 9 nodes. Hope that is fine.


#6

Yes, that is fine. BTW, you can have previous data on the nodes but then you have to know all the various caveats - out of scope for your need. :slight_smile:


#7

The specific concern with prior data is whether you want it incorporated in the current set of data or not. Without a disk wipe the data will come in, and if that were desired it would still be preferable to have it all come in at once.


#8

Thanks Piyush. One last confirmation. Lets say I have current active nodes asd01 … asd09 in 16GBRAM each and I’m adding asd10…asd15 each containing 128GB RAM. Eventually once adding bigger nodes older 01…09 will get removed. For this I just have to have asd10 aerospike.conf to have entries of {asd01…09…15} to join the cluster and distribute the data to asd11…asd15 but asd11…15 will have entries of asd10…15 only . By this way once all 01…09 are shutdown in rolling fashion. I will change aerospike.conf of asd10 to have current active nodes asd10…15 only. And restart that node only. By this way I don’t have to restart other nodes.


#9

No …that is not the way to do it.

Note:

1 - CONFIG file is read only when the node starts

2 - First good entry in the mesh-seed-address-port list is used to make an initial connection to the cluster to join the cluster. The node then obtains all the ip addresses of all the nodes from the cluster itself, not from the mesh-seed-address-port entries.

3 - You provide multiple entries in mesh-seed-address-port so that if the first node of the cluster is down, the node will try the next one till it makes one good connection.

Assume a1 is 10.1.1.1, a2 is 10.1.1.2… and so on.

Start a9, giving one entry in mesh-seed-address-port of a1 ie 10.1.1.1 3000 (assuming port 3000 default is what you are using) in its config file.

Verify node joined the cluster. (Now it has all a1 thru a9 ip addresses/ports from the cluster)

Likewise,

Start a10, giving one entry in mesh-seed-address-port of a1 ie 10.1.1.1 3000 (assuming port 3000 default is what you are using) in its config file.

Start a11, giving one entry in mesh-seed-address-port of a1 ie 10.1.1.1 3000 (assuming port 3000 default is what you are using) in its config file.

…and same for a12, …a14. (5 nodes added)

Once you are all happy and running, edit config file of a10 to change mesh-seed-address-port entry to 10.1.1.11, plus add more than one entry, perhaps all the rest, mesh-seed-address-port 10.1.1.14 … . This will make sure that at a later date if you restart node a10, it can find the correct mesh-seed-address-port entry. Do similar edits for a11 thru a14 config files.

shutdown a1. Suggest you wait for migrations to complete before shutting down a2, though post 3.14+ you don’t have to. If you shut one at a time, you can monitor the health of the cluster as data rebalances.

No need to restart any nodes.

( What you must not do is to first make a separate cluster of a10 through a14 and then make it join the a1 through a9 cluster. )


#10

Got it !! thanks much . I might execute this over weekend, I will bump here incase I screw up something :slight_smile:


#11

It might be worth investing your time in taking AS102 course offered by Aerospike … check www.aerospike.com/schedule


#12

Post 3.14 you still need to wait for migrations when removing nodes from the cluster. You no longer need to wait when restarting nodes with persistence.

It wouldn’t be catastrophic, will just cause a lot of unnecessary migrations.

There are problems with this method. As you remove old nodes, the amount of data each node must store increases. At some point the old lower capacity nodes may be overwhelmed.

Instead, in 3.14+, I suggest that you:

  1. Dynamically set the rack-id of the original nodes to 1.
  2. Add all of the new nodes.
  3. Wait for migrations to finish.
  4. Stop all of the old nodes.
  5. Also recommend waiting for migrations to finish before decommissioning the old nodes. This will allow you to add the old nodes back in the event that a new node hard fails (such as a disk failure).

#13

Yupp…that is an issue! Could another way be to add 9 instead of 5 nodes of higher capacity? Then, you can take out 9 lower capacity nodes one by one, and then take out 4 higher capacity nodes.


#15

Piyush,

We just added 7 nodes to existing 9 nodes and undergone data loss. Below is the log information of existing 9nodes that had data. And I could see entries having truncate for all sets on all 9nodes. And when I compared safe backup that I took before adding 7nodes, it had 40M records after adding and migrations got over it had only 34M records overall.

Feb 19 2018 20:03:43 GMT: INFO (truncate): (truncate.c:470) {roger} truncate already will restart
Feb 19 2018 20:03:43 GMT: INFO (truncate): (truncate.c:440) {roger|usstats} truncating to 256738096417
Feb 19 2018 20:03:43 GMT: INFO (truncate): (truncate.c:470) {roger} truncate already will restart
Feb 19 2018 20:03:43 GMT: INFO (truncate): (truncate.c:440) {roger|usstatsunt1} truncating to 256738097390
Feb 19 2018 20:03:43 GMT: INFO (truncate): (truncate.c:470) {roger} truncate already will restart

#16

Those truncate logs are triggered when one of your clients request a truncate. A truncate command isn’t applied to a single node, it is distributed to all nodes. The “already will restart” happens because an ongoing truncate was interrupted multiple times and is in the process of aborting the scan to rescan with the new timestamp.

It isn’t “data loss” if the app is explicitly deleting the data.


#17

So we have been testing new 7 nodes by pushing in data and then truncating. Though the truncate completed successfully. And after that we rm -f /var/lib/aerospike on all 7 nodes and stopped aerospike on it. and then made aerospike.conf changes to join 9+7 nodes. Do you think this could have caused trucate to re-trigger after it got executed successfully and after an hr after our testing we stopped aerospike and removed aerospike data files?


#18

The truncate metadata exists in /opt/aerospike/smd/truncate.smd. Nodes will share this information on recluster events.


#19

Yep. Figured that as a culprit doing this. After restoration also the file exist but no harm now. Can we rm -f the file. ?


#20

It is very unusual to do so. For truncate, it shouldn’t be an issue to leave these entries alone since the truncation has already completed. To remove this file (or any other smd file), you will need to shutdown all nodes in the cluster and remove the file from all nodes. If any node still has a copy of this file when the cluster reforms, it will propagate its version of this file to the other nodes (which would return the cluster to the initial state).


#21

Since I restored after the truncate’s LUT. I guess it is safe to remove even if it reloads.