How do I confirm if data is migrating in an Aerospike cluster?

How do I confirm if data is being rebalanced in an Aerospike cluster?

Context

Sometimes I need to add or remove nodes in Aerospike cluster or issue a recluster command as part of a planned operation (e.g. after a quiesce command). It should trigger existing data on the cluster to rebalance (migrate) amongst the existing cluster nodes. How do I confirm that data is indeed being redistributed?

Method

When the size of a cluster changes (nodes are added or removed, or nodes unexpectedly leave a cluster) we expect Aerospike to redistribute the data across the nodes (in order to maintain the configured replication factor in case of the removal or loss of a node, or to expand into newly added nodes). This takes place as a 2-step process:

  • The rebalance calculation
  • The actual moving of Aerospike partitions (as explained in the FAQ migration messages knowledge-base article).

Note - If migrate-fill-delay is configured partitions will not fill up on nodes that did not previously owned the partition until the configured time has elapsed (in strong-consistency mode, replacing a node in the roster with an empty one will still fill up the node as those wouldn’t cound as fill migrations).

In order to confirm if data is actually moving, the following statistics and log lines would be helpful:

  1. Confirm if the bulk transfer and receive rates are moving:
fabric-bytes-per-second: bulk (1525,7396) ctrl (33156,46738) meta (42,42) rw (128,128)

See details in the Log Reference Manual

  1. Confirm if the following statistics are showing increments

Records being received or transmitted:

Active partitions actively being transmitted or received:

Monitor the migrate_record_retransmits statistic for potential records being retransmitted during migration. Retransmissions during migrations can happen for different reasons, including:

  • Connectivity problems between nodes or network bottleneck or overwhelmed node for example if migrations are tuned too agfressively.
  • Misconfigurations between cluster nodes such as write-block-size.

Keywords

MIGRATION MONITORING ACTIVE RETRANSMISSION

Timestamp

March 2021

© 2021 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.