How To - Scale down an Aerospike cluster with zero data loss

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

How To: Scale down an Aerospike cluster with zero data loss

Context

It may be necessary at times to shut down a number of Aerospike nodes en-masse. How can this be done to guarantee no data is lost?

Method 1 - Configure rack aware

The Aerospike rack aware feature allows nodes to be assigned a logical grouping known as a rack. The feature ensures that if a master copy of a partition exists in one rack, the replica copy of the partition will exist in the other rack. This means an entire rack can be shutdown without losing any data as the master and replica will never be in the same rack. The process is as simple as assigning a rack-id to the namespaces which can be done on a dynamic basis with an info command. Once a rack-id has been assigned, the rack can be shut down. The existing rack will then rebalance (migrations) to maintain replication factor. The advantage here is speed however the associated disadvantage is the time taken to migrate and regain replication factor in the remaining rack.

Method 2 - Quiesce the nodes

The quiesce command allows a node to give up ownership of any master partitions that reside on it. When a node is quiesced and a recluster command is issued, ownership of master partitions passes to the next node in the succession list. While client traffic still comes to the original master node (for a second or two, until the clients update their partition map) the transactions are proxied by that node to the new master. The advantage here is that no client timeouts will be experienced when the node is removed from the cluster as quiesce allows a smooth master handoff. When multiple nodes are quiesced migrations should be allowed to complete before any nodes are removed from the cluster. Though this method gives a smooth handoff and no client disruption, the disadvantage is that it will take longer.

Keywords

DOWNSIZE REMOVE NODE QUIESCE RACK AWARE

Timestamp

October 2019

1 Like

If I have a cluster of suppose 2 nodes and replication is turned off then how can i downscale it without the loss of data? I tried shutting down one node but that led to data loss. As far as i am aware Aerospike doesn’t let the user handle migrating partitions.

For replication-factor 1, the only way to stop/restart nodes without data loss is to quiesce and wait for migrations first.