I have a cluster of 4 nodes which contains a lot of data(replication factor 2). All of the data is completely on the RAM(no persistence). I’ve observed that when I shut down one or two nodes, the amount of migrations is a lot smaller than I expect(It’s in few KBs whereas the size of 1 record itself is around 8 KB). Does migration involve the shipping of complete record data or something which is smaller.
I monitor the migrations using the following command:
It is the number of partitions outbound (tx) and inbound (rx). There are record migration stats that accumulate since Aerospike starts, see asadm -e "show stat like migrate_record" - descriptions of these stats can be found at our metrics reference guide.
For a single namespace at RF=2, 8k is usually the peak but not the max; for example, recovering from an N way partitioned network whould have about (N-1)×4k inbound migrations and 4k outbound. This is because, in non-SC namespaces, each isolated node would generate fresh partitions which will all need to be reconciled when the network recovers.
The reason you have more than 8k is because you have 4 namespaces each of which have 4k partitions at RF=2 (even the empty namespace).
With the quiesce command, I can create a scenario where the number of migrations in SC will match that of an N way split in AP.
Without the quiesce command, I believe the worst case number of migrations would be (3×(RF-1)) × 4k (where N >= RF + (RF - 1)).
The following is a simplified model to represent what is going on in SC at the partition level:
‘P’ will represent a ‘full’ partition
‘S’ will represent a ‘subset’ of a partition
‘0’ will represent a ‘null’ partition
‘|’ will be used to delineate the replication-factor cutoff
Given an N node RF=2 cluster that is currently fully replicated would be represented as:
P P | 0 0 ...
If we remove one of the replicas it would become:
P | S 0 ...
Since the fill in replica doesn’t a full copy of the partition, it becomes a ‘subset’.
Assuming migrations are not allowed to complete and removing the next replica it would become.
| S 0 ...
Notice that in SC we will not create another partition at this point and since the original replicas are gone, this partition will become unavailable.
Now if we bring the original replicas back in it would become:
S S | S 0 ...
The reason the originals come in as ‘subset’ is that a node cannot know if any writes may have occurred while it was away. If we allow migrations to complete there will be the following migrations:
[1] -> [0] -- First replica to final-master
S S | S 0 ...
[2] -> [0] -- First non-replica with data to final-master
P S | S 0 ...
[0] -> [1] -- Final-master to first replica.
P P | S 0 ...
And finally a signal will be sent to all non-replicas to drop their copies and the nodes will again be in the following state: