by young » Thu Nov 15, 2012 3:58 pm
How can I tell when a migration is finished?
There are times when a cluster will need to move data from one node to another. This process is called “migration.” Sometimes it is referred to as “rebalancing.” It is sometimes important to note when a cluster is in a state of migration.
Before going through the methods to determine if a cluster is migrating or rebalancing, it is important to understand how the data is moved.
How data is migrated Migrations in an Aerospike cluster do not move one record (aka object/row) at a time, but rather as a partition. Every record is mapped to one of 4096 partitions. These partitions are distributed throughout the cluster. When a cluster determines that data must be migrated, the partition is moved atomically. Depending on your settings, more or fewer partitions may be migrated at the same time.
Tracking Migration There are a few different ways to see the progress of the migration. In some you will find a doublet of numbers “(M,N)”. These have meanings discussed below.
Logs - In the main Aerospike log file (“/var/log/citrusleaf.log”), you may find the following messages:
Apr 13 2012 23:28:35 GMT: INFO (info): (base/thr_info.c:2334) migrates in progress ( M , N ) ::: ClusterSize 1 ::: objects 1000
clmonitor info command If you run the command:
/usr/bin/clmonitor -h [host_ip]:[port] -e info
You will get output that contains the following:
===NODES===
2012-11-15 12:15:39.831484
ip:port Node id Cluster Size Objects FreeDisk Free Mem Migration build Sys Free Mem Cluster Visibility
192.168.110.101:3000 BB9F22906CA0568 1 1.6M 0% 86% (M,N) 2.0.23.85 69% True
Total number of objects in cluster : 1.6 M
clinfo -v statistics/telnet command
If you run the command:
clinfo -h [hostname] -p [port] -v statistics | awk '{gsub (";","\n"); print $0}'
or telnet to the info port (default is 3003) and issue the “statistics” command. You will see output that contains the following:
migrate_tx_objs=M
migrate_rx_objs=N
Interpreting the results
No matter how you arrived at the numbers, the interpretation of the information is the same:
M Is the number of partitions that a node is currently transmitting. This is not the number it will migrate until the balance is complete. There is some migration that is scheduled only when data has arrived from another server. It starts out at a pretty high number (on small clusters it can be larger, like 4,000 per namespace, and on large clusters it is smaller).
N is the number of partitions the server is currently sending; that is, in flight. This is not the number of receives it needs to receive to become sync’d. This number is often low - 1, 2, 3. It is never greater than the number of nodes in a cluster.
When the number transmitted partitions (M) becomes zero across the cluster, it is whole and synchronized. There should be 0’s in the receive slots (N) too, but there is a bug where it hangs at the number 1 sometimes, and there are moments when one migrate completes and another is about to start.
The basic rule is to look for the number of partitions being transmitted on every node in the cluster. When these are “0” (zero) across the cluster, the migrations have finished.