FAQ - Monitoring migrations on a live Aerospike cluster


#1

Details

Migration process covers the data rebalancing that takes place if any node leaves or joins the cluster either due to a planned restart or a network hiccup that resulted in a node departing and/or re-joining a cluster. This knowledge-base article goes over details on monitoring migration progress on a live Aerospike cluster based on the statistics and covers some key changes between Aerospike releases so that appropriate monitoring can be done per the installed version.

Note: The statistics for migrations have changed and updated over different Aerospike versions. The example shared here covers the major versions that witnessed the changes - 3.5.14, 3.7.5, 3.8.4 and 3.9.0. Refer to the release notes to check the correct version and get the appropriate statistics to be monitored.

Answer

This table shows a statistics comparison over Aerospike versions. The values which show up as N/E are the statistics that no longer exist on that version.

Admin> show stat like migrate
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Service Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                         :   Version_3.5.14_IP:3000   Version_3.7.5_IP:3000   Version_3.8.4_IP:3000   Version_3.9.0_IP:3000   
migrate_allowed              :   N/E                      N/E                     true                    true                   
migrate_msgs_recv            :   15218260                 1174239                 N/E                     N/E                    
migrate_msgs_sent            :   15210068                 1172649                 N/E                     N/E                    
migrate_num_incoming_accepted:   4096                     1186                    N/E                     N/E                    
migrate_num_incoming_refused :   0                        0                       N/E                     N/E                    
migrate_partitions_remaining :   N/E                      N/E                     1235                    1321                   
migrate_progress_recv        :   0                        2                       1235                    1321                   
migrate_progress_send        :   0                        1                       1235                    1321                   
migrate_rx_objs              :   0                        740                     N/E                     N/E                    
migrate_tx_objs              :   0                        1663                    N/E                     N/E                    

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~test Namespace Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                           :   Version_3.5.14_IP:3000   Version_3.7.5_IP:3000  Version_3.8.4_IP:3000  Version_3.9.0_IP:3000  
migrate-order                  :   N/E                      5                      5                      5                      
migrate-record-receives        :   N/E                      N/E                    16685921               N/E                    
migrate-record-retransmits     :   N/E                      N/E                    121436                 N/E                    
migrate-records-skipped        :   N/E                      N/E                    753978                 N/E                    
migrate-records-transmitted    :   N/E                      N/E                    15021239               N/E                    
migrate-rx-instance-count      :   N/E                      N/E                    132                    N/E                    
migrate-rx-partitions-active   :   N/E                      N/E                    1                      N/E                    
migrate-rx-partitions-initial  :   N/E                      2070                   1006                   N/E                    
migrate-rx-partitions-remaining:   N/E                      886                    811                    N/E                    
migrate-sleep                  :   N/E                      1                      1                      1                      
migrate-tx-instance-count      :   N/E                      N/E                    13                     N/E                    
migrate-tx-partitions-active   :   N/E                      N/E                    1                      N/E                    
migrate-tx-partitions-imbalance:   N/E                      0                      0                      N/E                    
migrate-tx-partitions-initial  :   N/E                      2070                   1006                   N/E                    
migrate-tx-partitions-remaining:   N/E                      1670                   424                    N/E                    
migrate_record_receives        :   N/E                      N/E                    N/E                    11823123               
migrate_record_retransmits     :   N/E                      N/E                    N/E                    11637                  
migrate_records_skipped        :   N/E                      N/E                    N/E                    14345                  
migrate_records_transmitted    :   N/E                      N/E                    N/E                    13637799               
migrate_rx_instances           :   N/E                      N/E                    N/E                    131                    
migrate_rx_partitions_active   :   N/E                      N/E                    N/E                    0                      
migrate_rx_partitions_initial  :   N/E                      N/E                    N/E                    1064                   
migrate_rx_partitions_remaining:   N/E                      N/E                    N/E                    859                    
migrate_tx_instances           :   N/E                      N/E                    N/E                    5                      
migrate_tx_partitions_active   :   N/E                      N/E                    N/E                    1                      
migrate_tx_partitions_imbalance:   N/E                      N/E                    N/E                    0                      
migrate_tx_partitions_initial  :   N/E                      N/E                    N/E                    1064                   
migrate_tx_partitions_remaining:   N/E                      N/E                    N/E                    462     

Notes

Server versions 3.9.0 onwards

  1. Relevant statistics at the service level -
  • migrate_partitions_remaining - Number of partitions remaining migration in either direction.
  • migrate_progress_recv, migrate_progress_send - Will be deprecated and should simply match migrate_partitions_remaining. Refer to migrate_rx_partitions_active and migrate_tx_partitions_active at the namespace level for the number of partitions currently being received or sent.
  1. Relevant statistics at the namespace level (main highlight being that - are replaced with _ )-
  • migrate_record_receives, migrate_records_transmitted, migrate_record_retransmits, migrate_records_skipped - Number of record inserts received, records transmitted out, retransmitted or skipped (because remote node was upto date).
  • migrate-rx-instances, migrate-tx-instances - Number of instance objects managing immigrations and emigrations.
  • migrate_rx_partitions_active, migrate_rx_partitions_initial, migrate_rx_partitions_remaining - Number of partitions currently immigrating to the node, partitions queued up to be received and remaining.
  • migrate_tx_partitions_active, migrate_tx_partitions_initial, migrate_tx_partitions_remaining - Number of partitions currently emigrating from the node, partitions queued up to be sent and remaining.
  • migrate_tx_partitions_imbalance - Number of partitions migrations failures.

Server versions 3.8.3 and 3.8.4

  1. Rapid rebalance as a feature for Enterprise versions for faster data rebalancing.

  2. Relevant statistics at the service level -

  • Number of partitions remaining migration in either direction - migrate_partitions_remaining
  • Number of partitions currently being received or sent - migrate_progress_recv, migrate_progress_send
  1. Relevant statistics at the namespace level -
  • Number of record insert requests received, records transmitted out, retransmitted or skipped (because remote node was upto date) - migrate-record-receives, migrate-records-transmitted, migrate-record-retransmits, migrate-records-skipped
  • Number of instance objects managing immigrations and emigrations - migrate-rx-instance-count, migrate-tx-instance-count
  • Number of partitions currently immigrating to the node, partitions queued up to received and remaining - migrate-rx-partitions-active, migrate-rx-partitions-initial, migrate-rx-partitions-remaining
  • Number of partitions currently emigrating from the node, partitions queued up to be sent and remaining - migrate-tx-partitions-active, migrate-tx-partitions-initial, migrate-tx-partitions-remaining
  • Number of partitions migrations failures - migrate-tx-partitions-imbalance

Pre 3.8.3 to 3.7.5

  1. Starting server version 3.7.5, migration order can now be defined where some namespaces can be given priority over others for migrations.

  2. Some migration statistics are now moved under namespace level.

  3. Relevant statisitcs at the service level -

  • Number of migrate messages being received and sent - migrate_msgs_recv, migrate_msgs_sent
  • Number of migrates accepted or refused due to the incoming limit reached - migrate_num_incoming_accepted, migrate_num_incoming_refused
  • Number of partitions currently being received or sent - migrate_progress_recv, migrate_progress_send
  • Partitions being sent or received - migrate_rx_objs, migrate_tx_objs.
  1. Relevant statistics at the namespace level - Number of partitions queued up to receive in the current migration cycle and remaining - migrate-rx-partitions-initial, migrate-rx-partitions-remaining
  • Number of partitions queued up to be sent out in the current migration cycle and remaining - migrate-tx-partitions-initial, migrate-tx-partitions-remaining
  • Number of partitions migrations failures - migrate-tx-partitions-imbalance

Pre 3.7.5 versions

  1. For versions prior to 3.7.5, Migrations progress were monitored at the service level rather than at the namespace level.

  2. Relevant statistics -

  • Number of migrate messages being received and sent - migrate_msgs_recv, migrate_msgs_sent
  • Number of migrates accepted or refused due to the incoming limit reached - migrate_num_incoming_accepted, migrate_num_incoming_refused
  • Number of partitions currently being received or sent - migrate_progress_recv, migrate_progress_send
  • Partitions being sent or received - migrate_rx_objs, migrate_tx_objs.

Logs

The logs show the progress of migration in Partitions, not individual records.

To read more about Server log messages, see Log Reference

Reference

Details on any of the above metrics is avalable on the metrics reference manual: http://www.aerospike.com/docs/reference/metrics

For more information on managing and tuning migrations:

http://www.aerospike.com/docs/operations/manage/migration

To read more about Migrations, see Rapid Rebalance: Enterprise-Grade Migrations

Keywords

MIGRATION MONITOR STATISTICS NODE MIGRATE ASADM LOGS

Timestamp

12/08/2016


Managing Migrations