Aerospike behavior when node dies

Thanks for posting to on our forum!

You can find details about Aerospike’s data distribution and rebalancing mechanism on our site, specifically on this page.

To your questions:

  • With replication factor 1, if a node goes down the data stored on it will not be available (roughly 1/N of the data, with N the number of nodes in the cluster). With replication factor 2, the data will be available from the node holding the replica copy of the data that was held by the node holding the master copy. The data will also rebalance across all the nodes and you would still end up with 2 copies (assuming you have enough storage capacity on the remaining nodes of course). I encourage you to read also the per-transaction consistency guarantees.

  • The client fetches a partition map every second from all nodes in the cluster (a partition map indicates, for each partition, which node holds the master copy and which node holds the replica). So when a node goes down, the client will within a second max get the new partition map and will not issue any request to a dead node. If the data has not rebalanced yet, request will be proxied to the node holding the data.

  • When a node leaves or joins a cluster, a new partition map is generated and the data rebalances accordingly across the nodes (we call this ‘migrations’). During migrations, data may not ‘yet’ be on the node it’s supposed to be, which will cause such requests to ‘proxy’ to the right node.

  • When a node ‘dies’ or leaves the cluster, a new partition map is generated which the client will get within 1 second and will then write the data to the new node holding the master copy for that data. You can tune the number of retries and the time out per request on the client side.

Hope this helps, let us know otherwise! –meher