Migrations and consistency

I’m working on a 2 node cluster with strong consistency enabled on both the nodes. When a node, which was down for some time is brought up, I can see that some migrations take place. I use the below mentioned command:

asadm -e "asinfo -v 'namespace/test_namespace' -l" | grep partition

Output(on my 2 node cluster)

dead_partitions=0
    unavailable_partitions=0
    migrate_tx_partitions_imbalance=0
    migrate_tx_partitions_active=0
    migrate_rx_partitions_active=0
    migrate_tx_partitions_initial=4096
    migrate_tx_partitions_remaining=4095
    migrate_tx_partitions_lead_remaining=2012
    migrate_rx_partitions_initial=4096
    migrate_rx_partitions_remaining=4095
    partition-tree-sprigs=256
    sindex.num-partitions=32
    dead_partitions=0
    unavailable_partitions=0
    migrate_tx_partitions_imbalance=0
    migrate_tx_partitions_active=0
    migrate_rx_partitions_active=0
    migrate_tx_partitions_initial=4096
    migrate_tx_partitions_remaining=4095
    migrate_tx_partitions_lead_remaining=2082
    migrate_rx_partitions_initial=4096
    migrate_rx_partitions_remaining=4095
    partition-tree-sprigs=256
    sindex.num-partitions=32

Although the number of ‘unavailable partitions’ becomes 0(once the 2nd node is brought up) but you’ll see that there are still some migrations remaining(both in tx and rx). Till the remaining migration count becomes 0, is this cluster-state stable(for read/write). If not, when should I worry about these counts(if at all) and which counts specifically?

I also couldn’t understand the count difference between migrate_tx_partitions_remaining vs migrate_tx_partitions_lead_remaining. After reading the description, I had thought that there shouldn’t be any difference in both the counts since both the nodes are present in the roster. I’ll really appreciate if someone can clarify.

While SC will work with a 2 node cluster, I strongly suggest using more than replication-factor nodes. With only replication-factor nodes, maintenance events become problematic because the cluster will become unavailable any time you take down a single node. Therefore with replication-factor sized clusters, routine maintenance events, such as upgrades, become zero availability events.

If a partition is available then reads and writes can be made and will be consistent, if a partition is unavailable then your client with get an unavailable result code. Your application shouldn’t need to be concerned with these stats. However, your monitoring environment should incorporate these stats.

The migrate_tx_partitions_lead_remaining are a subset of migrate_tx_partitions_remaining. The lead migrations are not delayed by the migrate-fill-delay configuration. This is a separate topic, if you are interested read up on migrate-fill-delay.

1 Like

This topic was automatically closed 6 days after the last reply. New replies are no longer allowed.