FAQ - In what circumstances might a Strongly Consistent Aerospike cluster migrate?
Starting with the Aerospike 220.127.116.11 release a new mode of operation was introduced to Aerospike, Strong Consistency. Strong Consistency is an Enterprise feature that guarantees, among other consistency related features that writes are never lost, including under network partition events.
Under what circumstances would migrations be observed for an Aerospike cluster running in Strong Consistency mode?
The very short answer is that only partitions that are available will be migrating, but let’s go over the details.
To illustrate the answer, a 5 node cluster is considered as an example. Replication factor is 2. From a cluster perspective there is no way to differentiate between a cluster split brain and the loss of one or several nodes. This example describes the migration behaviour as nodes are shut down in turn and how the roster-set and recluster commands behave in terms of migration. In explaining the various behaviours it is useful to consider the 3 main rules of Aerospike Strong Consistency:
- If a sub cluster (a.k.a. split-brain) has both the roster-master and roster-replica for a partition, then the partition is active for both reads and writes in that sub cluster
- If a sub cluster has a strict majority of nodes and has either the roster-master or roster-replica for the partition within its component nodes, the partition is active for both reads and writes in that sub cluster
- If a sub cluster has exactly half of the nodes in the full cluster (roster) and it has the roster-master within its component nodes, the partition is active for both reads and writes
The above rules only apply when all data for a partition is available within a sub cluster – potentially across multiple nodes of course (a node being fully emptied and added back in a cluster will for example invalidate those rules but this article will not cover those less common situations that would also render a partition unavailable).
A roster-master node for a given partition is the node that would hold the master copy of this partition if the roster was complete (all nodes in the roster forming a single cluster).
A roster-replica node for a given partition is the node that would hold a replica copy of this partition if the roster was complete (all nodes in the roster forming a single cluster).
Node 5 is shut down
- Node 5 leaves the cluster
- All partitions remain available, even those for which node 5 is the roster-master.
- The roster-replica for those partitions exists in the cluster and there is a majority, reads and writes are not affected.
- The cluster migrates so that it retains 2 copies of the partitions which are on node 5 remaining within the remaining cluster.
- At the end of migrations, for the partitions for which node 5 is roster master, there is a roster replica, which is also the acting master within the cluster. There is an acting replica for those partitions within the cluster.
Node 4 is shut down
- Node 4 leaves the cluster
- There are certain partitions for which the roster-master and roster-replica sat on nodes 4 and 5. Those partitions are now unavailable even though a full copy (acting replica) exists within the remaining cluster. They do not migrate.
- There are other partitions where either the roster-master or the roster-replica was on node 4 or 5 and a roster-master or roster-replica remains on nodes 1-3. Those partitions remain available (3 is a strict majority in a 5 node cluster) and migrate so that there are eventually 2 copies within the remaining cluster.
From the cluster’s perspective, those missing 2 nodes could be up and running and forming a 2 node cluster on their own, which would allow them to take read and writes. Therefore the remaining cluster cannot have such partition available (where roster-master and roster-replica are both on those missing 2 nodes).
The roster is now reset to reduce it to 3 nodes using roster-set.
- The roster-set command simply sets the pending roster but doesn’t make it effective until the next cluster change event (either through a node leaving or joining the cluster, or directly through the recluster command). Therefore, at this point, from a partition availability or from a migration perspective, nothing happens. The cluster is in a stable state as migrates from the loss of node 4 have completed in the previous step. The cluster has some unavailable partitions even though there is a full copy of that data within the remaining sub-cluster.
The recluster command is issued.
- Migrations begin, unavailable partitions now become available.
- Loss of a single node will not cause partitions to become unavailable when replication factor is 2 or greater
- When a node is removed from a cluster, migrations of available partitions will occur to maintain replication factor.
- When there are unvailable partitions, they will not migrate until the roster is reset and the recluster command is issued
STRONG CONSISTENCY MIGRATION ROSTER-SET RECLUSTER