I’m wondering how Aerospike manages the partition map. As we know, client will get the partition map as soon as it connects to the nodes in an Aerospike cluster. So my questions are:
1 Is there a ‘coordinator’ in an Aerospike cluster who manages the partition map? If not, how each node get a consistent view of the partition map?
2 When a node is added to the cluster or one node is removed, the partition map will change accordingly and the migration starts. So who detects the change of nodes in the cluster and updates the partition map and starts the migration?
Each server node manager a map of the partitions it is a master or replica for:
When the clients detect that the partition [map] generation has changed, it will query this map from each node.
The client polls the partition-generation every 2 seconds (by default). When it detects that any node’s partition-generation has moved it queries the current partition map from each node in the cluster.
Any change to the partition map on the server causes the serve to advance its partition-generation.
This is a high level description, the actual algorithm is a bit more involved, you can find the algorithm here:
So the client will poll the partition map from each node(each node’s partition map contains only the records of which it is the replica or master). So when a new node is added to the cluster, it will send ‘heartbeat’ to the node in its configuration, right? And then its entering will be visible to all the other nodes in the cluster and this triggers each node in the cluster to update its ‘partition generation’ and updates its partition map, is that correct?
The partition-map is multiple (replication-factor) base64 encoded bitmaps that indicate which partitions a node is master/replica for. A record is member of a partition.
At a very high level, that description could be a functional mental model.
The heartbeat from a new node triggers the cluster to reform to include that node. This eventually triggers a rebalance that determine how partitions should be distributed among nodes. Migrations follow this to move partitions to match the new ‘balance’. As migrations complete, the partition-generation increments triggering nodes to request the current maps. Thus the node joining the cluster can cause the partition-generation to increment thousands of times.