What are migrations composed of?


#1

I have a cluster of 4 nodes which contains a lot of data(replication factor 2). All of the data is completely on the RAM(no persistence). I’ve observed that when I shut down one or two nodes, the amount of migrations is a lot smaller than I expect(It’s in few KBs whereas the size of 1 record itself is around 8 KB). Does migration involve the shipping of complete record data or something which is smaller.

I monitor the migrations using the following command:

asadm -e ‘info net’


#2

That command is reports the number of pending partition migrations (the unit is partitions, not bytes).


#3

How can it be more than 4K then?


#4

Inbound and outbound migrations.

Could you share the output of:

asadm -e "info namespace"

#5

So does this number(Migrations) represents the partitions or the number of records?


#6

It is the number of partitions outbound (tx) and inbound (rx). There are record migration stats that accumulate since Aerospike starts, see asadm -e "show stat like migrate_record" - descriptions of these stats can be found at our metrics reference guide.


#7

In that case, shouldn’t it’s max value be 8K(there are 4096 partitions, right?) What am I missing something?


#8

For a single namespace at RF=2, 8k is usually the peak but not the max; for example, recovering from an N way partitioned network whould have about (N-1)×4k inbound migrations and 4k outbound. This is because, in non-SC namespaces, each isolated node would generate fresh partitions which will all need to be reconciled when the network recovers.

The reason you have more than 8k is because you have 4 namespaces each of which have 4k partitions at RF=2 (even the empty namespace).


#9

Thanks for the info. And what about SC namespaces? Are the migrations limited to 4K per namespace in that case?


#10

With the quiesce command, I can create a scenario where the number of migrations in SC will match that of an N way split in AP.

Without the quiesce command, I believe the worst case number of migrations would be (3×(RF-1)) × 4k (where N >= RF + (RF - 1)).

The following is a simplified model to represent what is going on in SC at the partition level:

  • ‘P’ will represent a ‘full’ partition
  • ‘S’ will represent a ‘subset’ of a partition
  • ‘0’ will represent a ‘null’ partition
  • ‘|’ will be used to delineate the replication-factor cutoff

Given an N node RF=2 cluster that is currently fully replicated would be represented as:

P P | 0 0 ...

If we remove one of the replicas it would become:

P | S 0 ...

Since the fill in replica doesn’t a full copy of the partition, it becomes a ‘subset’.

Assuming migrations are not allowed to complete and removing the next replica it would become.

| S 0 ...

Notice that in SC we will not create another partition at this point and since the original replicas are gone, this partition will become unavailable.

Now if we bring the original replicas back in it would become:

S S | S 0 ...

The reason the originals come in as ‘subset’ is that a node cannot know if any writes may have occurred while it was away. If we allow migrations to complete there will be the following migrations:

[1] -> [0]  -- First replica to final-master
    S S | S 0 ...
[2] -> [0]  -- First non-replica with data to final-master
    P S | S 0 ...
[0] -> [1]  -- Final-master to first replica.
    P P | S 0 ...

And finally a signal will be sent to all non-replicas to drop their copies and the nodes will again be in the following state:

P P | 0 0 ...

#11

This topic was automatically closed 6 days after the last reply. New replies are no longer allowed.