Migrations TX constantly Climbing

Hi,

I have a cluster running 3.8.2.2 (3.6.0->3.6.1->3.6.2->3.7.1->3.7.5.1->3.8.2.2 (noticed first on 3.7.5.1)) which has a constantly climbing number of pending migrations (migrate_tx_objs). When removing a node from the cluster migrate_tx_objs drops back to 0.

Both migrate_msgs_recv and migrate_msgs_send are steadily climbing and from what I can tell there are no topology changes.

10.3.17.2:3000 (10.3.17.2) returned:
migrate_msgs_recv=1472782432
migrate_msgs_sent=1472780638
migrate_num_incoming_accepted=938
migrate_num_incoming_refused=0
migrate_progress_recv=1
migrate_progress_send=1
migrate_rx_objs=2
migrate_tx_objs=445

10.3.17.5:3000 (10.3.17.5) returned:
migrate_msgs_recv=3163772376
migrate_msgs_sent=3163768509
migrate_num_incoming_accepted=2193
migrate_num_incoming_refused=0
migrate_progress_recv=3
migrate_progress_send=1
migrate_rx_objs=4
migrate_tx_objs=362

10.3.17.6:3000 (10.3.17.6) returned:
migrate_msgs_recv=3644629325
migrate_msgs_sent=3644635853
migrate_num_incoming_accepted=2598
migrate_num_incoming_refused=0
migrate_progress_recv=0
migrate_progress_send=1
migrate_rx_objs=0
migrate_tx_objs=398

10.3.17.3:3000 (10.3.17.3) returned:
migrate_msgs_recv=2050206848
migrate_msgs_sent=2050204338
migrate_num_incoming_accepted=1309
migrate_num_incoming_refused=0
migrate_progress_recv=1
migrate_progress_send=1
migrate_rx_objs=1
migrate_tx_objs=376

10.3.17.4:3000 (10.3.17.4) returned:
migrate_msgs_recv=2370141674
migrate_msgs_sent=2370149778
migrate_num_incoming_accepted=1572
migrate_num_incoming_refused=0
migrate_progress_recv=1
migrate_progress_send=1
migrate_rx_objs=1
migrate_tx_objs=357

10.3.17.7:3000 (10.3.17.7) returned:
migrate_msgs_recv=4860432337
migrate_msgs_sent=4860437139
migrate_num_incoming_accepted=2632
migrate_num_incoming_refused=0
migrate_progress_recv=0
migrate_progress_send=1
migrate_rx_objs=0
migrate_tx_objs=257

about 2mins later

10.3.17.2:3000 (10.3.17.2) returned:
migrate_msgs_recv=1474013450
migrate_msgs_sent=1474011654
migrate_num_incoming_accepted=938
migrate_num_incoming_refused=0
migrate_progress_recv=0
migrate_progress_send=1
migrate_rx_objs=0
migrate_tx_objs=444

10.3.17.5:3000 (10.3.17.5) returned:
migrate_msgs_recv=3165726338
migrate_msgs_sent=3165722470
migrate_num_incoming_accepted=2194
migrate_num_incoming_refused=0
migrate_progress_recv=1
migrate_progress_send=1
migrate_rx_objs=1
migrate_tx_objs=361

10.3.17.6:3000 (10.3.17.6) returned:
migrate_msgs_recv=3645648370
migrate_msgs_sent=3645654897
migrate_num_incoming_accepted=2598
migrate_num_incoming_refused=0
migrate_progress_recv=0
migrate_progress_send=1
migrate_rx_objs=0
migrate_tx_objs=397

10.3.17.3:3000 (10.3.17.3) returned:
migrate_msgs_recv=2052719209
migrate_msgs_sent=2052716696
migrate_num_incoming_accepted=1311
migrate_num_incoming_refused=0
migrate_progress_recv=1
migrate_progress_send=1
migrate_rx_objs=2
migrate_tx_objs=375

10.3.17.4:3000 (10.3.17.4) returned:
migrate_msgs_recv=2373622357
migrate_msgs_sent=2373630457
migrate_num_incoming_accepted=1575
migrate_num_incoming_refused=0
migrate_progress_recv=2
migrate_progress_send=1
migrate_rx_objs=4
migrate_tx_objs=355

10.3.17.7:3000 (10.3.17.7) returned:
migrate_msgs_recv=4862215455
migrate_msgs_sent=4862220253
migrate_num_incoming_accepted=2634
migrate_num_incoming_refused=0
migrate_progress_recv=2
migrate_progress_send=1
migrate_rx_objs=2
migrate_tx_objs=256

Thanks

migrate_tx_objs counts outbound migration objs (instances) queued or in process of being sent, during migrations there are events that cause additional instances to be created.

migrate_rx_objs counts all inbound migration instances which will linger for up 60 seconds after the migration completes.

Removing (adding, replacing, etc) a node will advance the cluster key. If a migration instance object is processed with a non matching cluster key it is disposed of.

If you would like to see the current progress of your migrations:

You could use either asadm or asinfo:

With asadm:

asadm -e info # Output includes the % remaining migrations.

asadm -e "show statistics namespace like x-partitions"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~test Namespace Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                           :   u10:3000               u11:3000                    u13:3000   
migrate-rx-partitions-initial  :   N/E                    1318                        1398                        
migrate-rx-partitions-remaining:   N/E                    1097                        1176                        
migrate-tx-partitions-imbalance:   N/E                    0                           0                           
migrate-tx-partitions-initial  :   N/E                    1398                        1318                        
migrate-tx-partitions-remaining:   N/E                    1176                        1097                   

WIth asinfo:

asinfo -h [HOST] -v namespace/[NAMESPACE_NAME] -l | egrep 'migrate-[tr]x-partition'
migrate-tx-partitions-initial=1398
migrate-tx-partitions-remaining=1209
migrate-rx-partitions-initial=1318
migrate-rx-partitions-remaining=1130
migrate-tx-partitions-imbalance=0
asadm -e "show statistics namespace like x-partitions"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~switch Namespace Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                           :   10.3.17.2:3000   10.3.17.3:3000   10.3.17.4:3000   10.3.17.5:3000   10.3.17.6:3000   10.3.17.7:3000
migrate-rx-partitions-initial  :   1323             1272             1272             1277             1313             1171
migrate-rx-partitions-remaining:   216              263              203              224              212              256
migrate-tx-partitions-imbalance:   0                0                0                0                0                0
migrate-tx-partitions-initial  :   1323             1272             1272             1277             1313             1171
migrate-tx-partitions-remaining:   284              223              222              242              266              137

The number of outgoing migrates just keeps climbing until it hits 100% shortly afterwards migrate_msgs_recv and migrate_msgs_sent stop increasing.

I am trying to establish why the outgoing migrates just keep climbing…

The migrate-rx-partitions-remaining and migrate-tx-partitions-remaining stats are initially set to their associated “-initial” values and then count down to zero. These and their initial values will be reset when the cluster is responding to a disruption of some sort (such as adding or removing a node, and missing more than the heartbeat-interval heartbeats from a node).

If the cluster is experiencing disruption, they will be logged to the server’s log. Search for “DISALLOW” which shows up before a new cluster rebalance cycle.