I have 2 nodes in my cluster that seem to be stuck at a certain number of outstanding migrates (for hours) and playing with the different migrate options doesn’t seem to affect that. The rest of the cluster doesn’t really have any outstanding migrates. Also overall the box is under 50% CPU utilized and it’s under the disk/mem watermarks. Any ideas?
Nov 06 2015 17:36:20 GMT: INFO (info): (thr_info.c::4796) system memory: free 27962348kb ( 44 percent free )
Nov 06 2015 17:36:20 GMT: INFO (info): (thr_info.c::4804) migrates in progress ( 213 , 0 ) ::: ClusterSize 20 ::: objects 423825746 ::: sub_objects 0
Nov 06 2015 17:36:20 GMT: INFO (info): (thr_info.c::4812) rec refs 423825747 ::: rec locks 1 ::: trees 0 ::: wr reqs 1 ::: mig tx 213 ::: mig rx 1
Nov 06 2015 17:36:20 GMT: INFO (info): (thr_info.c::4818) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0
Nov 06 2015 17:36:20 GMT: INFO (info): (thr_info.c::4828) trans_in_progress: wr 1 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (902, 8909043, 8908141) : hb (32, 4143, 4111) : fab (684, 1737, 1053)
Nov 06 2015 17:36:20 GMT: INFO (info): (thr_info.c::4830) heartbeat_received: self 0 : foreign 16682056
Nov 06 2015 17:36:20 GMT: INFO (info): (thr_info.c::4831) heartbeat_stats: bt 0 bf 12685829 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 9 mrf 0 eh 3519 efd 109 efa 3410 um 0 mcf 3906 rc 4102
Nov 06 2015 17:36:20 GMT: INFO (info): (thr_info.c::4844) tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 1 migtx 213 ssdr 0 ssdw 0 rw 2
Nov 06 2015 17:36:20 GMT: INFO (info): (thr_info.c::4885) namespace biddingdb: disk inuse: 363410280832 memory inuse: 27124847744 (bytes) sindex memory inuse: 0 (bytes) avail pct 67 cache-read pct 9.40
Nov 06 2015 17:36:20 GMT: INFO (info): (thr_info.c::4905) partitions: actual 216 sync 190 desync 0 zombie 2 wait 0 absent 3688
asmonitor output one of the stuck nodes (there’s 19 other nodes):
ip:port Build Cluster Cluster Free Free Migrates Node Principal Replicated Sys
. Size Visibility Disk Mem . ID ID Objects Free
. . . pct pct . . . . Mem
ip-10-155-175-218.ec2.internal:3000 3.5.15 20 true 77 53 (213,0) BB98A03900B0022 BB9FEA2100A0022 423,826,733 44
ip/namespace Avail Evicted Master Repl Stop Used Used Used Used hwm hwm
Pct Objects Objects Factor Writes Disk Disk Mem Mem Disk Mem
. . . . . . % . % . .
ip-10-155-175-218.ec2.internal/biddingdb 67 0 226,064,321 2 false 338.45 G 23 25.26 G 47 50 60
Could you run the following at 2 minutes interval and post to check if any progress:
asadm -e 'asinfo -v statistics -l like migrate'
If a non-production cluster you may be able to bump up migration threads to a high number. Start by incrementing by 1 and keep an eye on iostat and memory usage.
Then I changed the threads from 4 to 6 and it unstuck itself - it went from doing a couple of dozen msgs every 10 seconds to doing hundreds of thousands of messages over a similar amount of time. Disk utilization also went significantly up and the “migrates in progress” started moving.
I got another instance of those … It’s been 2 days and the migrations have not cleared yet. When I bump up some setting it starts migrating for a big but then it slows down again … I tried bumping the threads up to 16 (on a 8 core box) and it helped temporarily but then stopped. I also bumped all the other migration related settings just to make sure other cluster operations aren’t in the way but the cluster is basically idle.
The cluster is currently only doing some writes (8k/sec) and no reads at all which is nothing for this 19 node cluster. I wanted to let the migrations finish before i switch on all the read traffic but it’s taken 2 full days so far and this node is not yet done. Whenever I bump up the migrating threads (from 4 to 8, or from 8 to 16 or from 16 to 24) that seems to get the migrations to start for a few minutes and then they got stuck again until I have to bump it to a higher number.
All the instances are i2.2xlarge instance and the network usage is around 20/20 mbps which is nothing on the aws network.
The disks are local SSDs – Amazon Elastic Compute Cloud
Btw I also checked the per-node TPS in the AMC and it seems evenly distributed - it’s not an issue with some hot key or extra traffic going to this one particular node.
Any other thoughts? What can be further done to confirm if this is a bug?
At this point I’m going to do a cold restart of that node.
The migrate_progress_recv stats indicate that nodes are receiving migrations, so we are not stuck from an algorithmic perspective.
The await time on xvda is fairly high.
Are all nodes configured with the same migration settings you show here?
You can verify by running:asadm -e "show config like migrate"
You cannot really use the migrate_progress_send stat as a progress bar. During migrations there are events that both increment and decrement this count, so the count remaining the same for an extended duration may have still made progress. Starting 3.7.0 we have created namespace level stats that will indicate the number of planned migrations for the current cycle and the number or remaining migrations for the current cycle. Here is an excerpt describing the stats from git:
AER-3639 Added new ns stats for mig progress
New Metrics:
migrate_tx_partitions_scheduled:
Total number of migrations this node will send during the current
migration cycle for this namespace.
migrate_tx_partitions_remaining
Number of migrations this node not yet sent during the current
migration cycle for this namespace.
migrate_rx_partitions_scheduled
Total number of migrations this node will receive during the current
migration cycle for this namespace.
migrate_rx_partitions_remaining
Number of migrations this node has not yet received during the current
migration cycle for this namespace.
Logging:
migrations remaining:
Indicate the number rx/tx remaining and in progress as well as the
percent complete.
Also migrate_progress_send to be migtx actively sending
Previously this was the number of migrations currently queued to send.