How can I tell when a migration is finished?


#1

by young » Thu Nov 15, 2012 3:58 pm

How can I tell when a migration is finished?

There are times when a cluster will need to move data from one node to another. This process is called “migration.” Sometimes it is referred to as “rebalancing.” It is sometimes important to note when a cluster is in a state of migration.

Before going through the methods to determine if a cluster is migrating or rebalancing, it is important to understand how the data is moved.

How data is migrated Migrations in an Aerospike cluster do not move one record (aka object/row) at a time, but rather as a partition. Every record is mapped to one of 4096 partitions. These partitions are distributed throughout the cluster. When a cluster determines that data must be migrated, the partition is moved atomically. Depending on your settings, more or fewer partitions may be migrated at the same time.

Tracking Migration There are a few different ways to see the progress of the migration. In some you will find a doublet of numbers “(M,N)”. These have meanings discussed below.

Logs - In the main Aerospike log file ("/var/log/citrusleaf.log"), you may find the following messages:

Apr 13 2012 23:28:35 GMT: INFO (info): (base/thr_info.c:2334) migrates in progress ( M , N ) ::: ClusterSize 1 ::: objects 1000

clmonitor info command If you run the command:

/usr/bin/clmonitor -h [host_ip]:[port] -e info

You will get output that contains the following:

===NODES===
2012-11-15 12:15:39.831484
ip:port Node id Cluster Size Objects FreeDisk Free Mem Migration build Sys Free Mem Cluster Visibility

192.168.110.101:3000 BB9F22906CA0568 1 1.6M 0% 86% (M,N) 2.0.23.85 69% True
Total number of objects in cluster : 1.6 M

clinfo -v statistics/telnet command

If you run the command:

clinfo -h [hostname] -p [port] -v statistics | awk '{gsub (";","\n"); print $0}'

or telnet to the info port (default is 3003) and issue the “statistics” command. You will see output that contains the following:

migrate_tx_objs=M
migrate_rx_objs=N

Interpreting the results

No matter how you arrived at the numbers, the interpretation of the information is the same:

M Is the number of partitions that a node is currently transmitting. This is not the number it will migrate until the balance is complete. There is some migration that is scheduled only when data has arrived from another server. It starts out at a pretty high number (on small clusters it can be larger, like 4,000 per namespace, and on large clusters it is smaller).

N is the number of partitions the server is currently sending; that is, in flight. This is not the number of receives it needs to receive to become sync’d. This number is often low - 1, 2, 3. It is never greater than the number of nodes in a cluster.

When the number transmitted partitions (M) becomes zero across the cluster, it is whole and synchronized. There should be 0’s in the receive slots (N) too, but there is a bug where it hangs at the number 1 sometimes, and there are moments when one migrate completes and another is about to start.

The basic rule is to look for the number of partitions being transmitted on every node in the cluster. When these are “0” (zero) across the cluster, the migrations have finished.


HowTo completely delete namespace data without any cluster downtime
#2

by sunguck.lee » Mon Jul 28, 2014 11:57 pm

Hi Young.

In the bottom of you article, you said "N is the number of partitions the server is currently sending"

Is it correct ? Don’t you mean “receiving” rather than "sending" After that, you said “This is not the number of receives …”, So I just wondering…

And, one more question about Aerospike migration process. I have 10 node cluster and each node have 90GB SSD data and 15GB Memory data.

This aerospike cluster(whole node of the cluster, not only restarted one) is still migrating data (after 2 hours) after restarting one node. There’s no put/get like operations, just the cluster is in the idle status. During migrating, each node’s disk read iops is over 5000 without any disk writing. And migration progress is never changed (never getting down, looks like progress is stayed at 310 ~ 370) Is it normal ?

And If I restart one node, then Aerospike cluster will redistribute all partitions (about 4096 partitions) across all node in the cluster ? I think it just okay that moving a few partitions on each node to restarted node.

Monitor> info
===NODES===
2014-07-29 15:19:26.135428
Sorting by IP, in Ascending order:
ip:port Build Cluster Cluster Free Free Migrates Node Principal Replicated Sys
. Size Visibility Disk Mem . ID ID Objects Free
. . . pct pct . . . . Mem
test041 3.3.9 10 true 72 45 (340,0) BB9C246077AC40C BB9E044077AC40C 228,223,701 54
test042 3.3.9 10 true 71 42 (395,1) BB91045077AC40C BB9E044077AC40C 240,574,968 51
test043 3.3.9 10 true 73 46 (332,1) BB96846077AC40C BB9E044077AC40C 223,898,443 54
test044 3.3.9 10 true 71 42 (358,1) BB9E044077AC40C BB9E044077AC40C 240,589,286 50
test045 3.3.9 10 true 72 44 (350,1) BB96A46077AC40C BB9E044077AC40C 234,250,380 52
test046 3.3.9 10 true 71 42 (372,1) BB90446077AC40C BB9E044077AC40C 241,421,654 50
test047 3.3.9 10 true 73 46 (333,1) BB92807077AC40C BB9E044077AC40C 223,086,577 54
test048 3.3.9 10 true 69 39 (395,2) BB91646077AC40C BB9E044077AC40C 252,037,486 49
test049 3.3.9 10 true 73 46 (337,1) BB9B046077AC40C BB9E044077AC40C 224,203,341 54
test050 3.3.9 10 true 70 41 (350,1) BB99046077AC40C BB9E044077AC40C 243,440,002 50
Number of nodes displayed: 10


===NAMESPACE===
Total (unique) objects in cluster for perfdb : 1,175,862,919
Note: Total (unique) objects is an under estimate if migrations are in progress.


ip/namespace Avail Evicted Master Repl Stop Used Used Used Used hwm hwm
Pct Objects Objects Factor Writes Disk Disk Mem Mem Disk Mem
. . . . . . % . % . .
test048/perfdb 69 0 127,449,744 2 false 90.14 G 31 15.02 G 61 50 60
test050/perfdb 70 0 119,141,229 2 false 87.06 G 30 14.51 G 59 50 60
test042/perfdb 71 0 128,614,336 2 false 86.04 G 29 14.34 G 58 50 60
test044/perfdb 71 0 118,573,313 2 false 86.04 G 29 14.34 G 58 50 60
test046/perfdb 71 0 116,822,774 2 false 86.34 G 29 14.39 G 58 50 60
test041/perfdb 72 0 113,400,242 2 false 81.62 G 28 13.60 G 55 50 60
test045/perfdb 72 0 113,969,742 2 false 83.77 G 28 13.96 G 56 50 60
test047/perfdb 73 0 111,963,028 2 false 79.78 G 27 13.30 G 54 50 60
test043/perfdb 73 0 113,972,609 2 false 80.07 G 27 13.35 G 54 50 60
test049/perfdb 73 0 111,955,902 2 false 80.18 G 27 13.36 G 54 50 60
Number of rows displayed: 10

#3

by maxulan » Tue Aug 05, 2014 3:53 pm

It is sometimes important to note when a cluster is in a state of migration.

Ok. It affects resources consumption and hence performance. Where else it’s important?

In case of replica node failure shall I wait until migration to be finished before restart a node? What will be result if failed node is back and data has been partially migrated to another node already (replication factor=2 but 3 nodes hold the same partitions)?

Thanks, Max


#4

by devops02 » Tue Aug 05, 2014 4:45 pm

Ok. It affects resources consumption and hence performance. Where else it’s important

When cluster is in a state of migration, you also want to consider some latency (reads/writes) spikes. Also as a note, migration can be configured so that it won’t interfere with transaction also http://www.aerospike.com/docs/operations/tune/migration/

In case of replica node failure shall I wait until migration to be finished before restart a node?

Yes, you will want to wait until migration is to be finished before restarting a node.

What will be result if failed node is back and data has been partially migrated to another node already (replication factor=2 but 3 nodes hold the same partitions)?

Depending if your data is on disk or memory. If it’s on disk and your problem was a bad hard drive, you will need to get copy from the other nodes. If the problem was with a bad power supply the data is still on disk and it is faster to read the data from the disk than to transfer it from other nodes. If your data is entirely in memory then your data will be gone and it will need to gather the data back from the other 2 nodes. But to answer your question, Aerospike will rebalance the information and request the missing information back from the other nodes. More info can be found here http://www.aerospike.com/docs/architecture/data-distribution.html

  • Jerry