The migrate_progress_recv stats indicate that nodes are receiving migrations, so we are not stuck from an algorithmic perspective.
- The await time on xvda is fairly high.
- Are all nodes configured with the same migration settings you show here?
You can verify by running:
asadm -e "show config like migrate"
You cannot really use the migrate_progress_send stat as a progress bar. During migrations there are events that both increment and decrement this count, so the count remaining the same for an extended duration may have still made progress. Starting 3.7.0 we have created namespace level stats that will indicate the number of planned migrations for the current cycle and the number or remaining migrations for the current cycle. Here is an excerpt describing the stats from git:
AER-3639 Added new ns stats for mig progress
- New Metrics:
- migrate_tx_partitions_scheduled: Total number of migrations this node will send during the current migration cycle for this namespace.
- migrate_tx_partitions_remaining Number of migrations this node not yet sent during the current migration cycle for this namespace.
- migrate_rx_partitions_scheduled Total number of migrations this node will receive during the current migration cycle for this namespace.
- migrate_rx_partitions_remaining Number of migrations this node has not yet received during the current migration cycle for this namespace.
- Logging:
- migrations remaining: Indicate the number rx/tx remaining and in progress as well as the percent complete.
Also migrate_progress_send to be migtx actively sending Previously this was the number of migrations currently queued to send.