How To: Scale down an Aerospike cluster with zero data loss
It may be necessary at times to shut down a number of Aerospike nodes en-masse. How can this be done to guarantee no data is lost?
Method 1 - Configure rack aware
The Aerospike rack aware feature allows nodes to be assigned a logical grouping known as a rack. The feature ensures that if a master copy of a partition exists in one rack, the replica copy of the partition will exist in the other rack. This means an entire rack can be shutdown without losing any data as the master and replica will never be in the same rack. The process is as simple as assigning a rack-id to the namespaces which can be done on a dynamic basis with an
info command. Once a rack-id has been assigned, the rack can be shut down. The existing rack will then rebalance (migrations) to maintain replication factor. The advantage here is speed however the associated disadvantage is the time taken to migrate and regain replication factor in the remaining rack.
Method 2 - Quiesce the nodes
The quiesce command allows a node to give up ownership of any master partitions that reside on it. When a node is quiesced and a recluster command is issued, ownership of master partitions passes to the next node in the succession list. While client traffic still comes to the original master node (for a second or two, until the clients update their partition map) the transactions are proxied by that node to the new master. The advantage here is that no client timeouts will be experienced when the node is removed from the cluster as quiesce allows a smooth master handoff. When multiple nodes are quiesced migrations should be allowed to complete before any nodes are removed from the cluster. Though this method gives a smooth handoff and no client disruption, the disadvantage is that it will take longer.
DOWNSIZE REMOVE NODE QUIESCE RACK AWARE