Speeding up restarts on GP2 volumes

aws
ec2

#1

We have a low-volume Aerospike cluster that I have migrated to instances with no local storage (m4.xlarge), using a network block volume (EBS GP2) as enabled by this CloudFormation template. For daily usage this seems to work fine. But once a node goes down, it can never come back up (it takes several days to come back up because a cold restart uses many tiny read operations). One node also seems to get bad performance when it starts reading a lot more than other nodes for some reason.

For details about GP2 volumes, see EBS Volume Types. We currently use 500GiB volumes, so each volume has a sustained throughput of 500 GiB × 3 IOPS/GiB = 1500 operations/second before it gets throttled.

Is there any way to make cold start use fewer, larger reads instead of so many 1.2KB reads?

Is there any way to slow the operations during migrations so that they don’t hit EBS limits?

Is there any way to find out why one node is doing many more small read operations than the other nodes? Is there a way to make it stop doing this? The Migrates Incoming and Migrates Outgoing are currently 0.

Read and write operations from the EC2 Console. The bad node is vol-0b74d1eb2cc3616f1 which has many more read ops than the other volume


#2

Presently, cold-start reads use a constant LOAD_BUF_SIZE which is set to 1 MiB. I’m not sure why you would be seeing 1.2KB reads.

Yes, migrate-sleep is the number of microseconds migrate-threads wait between each record shipped. Increasing the sleep and/or decreasing threads are the ways to throttle migrations.

Typically this is caused by one or more hotkeys. One possible solution is the set your clients to be able to read from any replica AS_POLICY_REPLICA_ANY, which for a small amount of hotkeys should help distribute the reads.


#3

Here is a partial aerospike.log after a warm restart. According to EBS, average read size is 1.7kiB. Seems like the really really slow part is creating a secondary index.

Thank you for your information.

Feb 28 2017 06:09:23 GMT: INFO (drv_ssd): (drv_ssd.c:3639) opened device /dev/xvdg: usable size 1073741824000, io-min-size 512
Feb 28 2017 06:09:23 GMT: INFO (drv_ssd): (drv_ssd.c:3493) storage: set device /dev/xvdg scheduler mode to noop
Feb 28 2017 06:09:23 GMT: INFO (drv_ssd): (drv_ssd.c:1094) /dev/xvdg has 1024000 wblocks of size 1048576
Feb 28 2017 06:09:23 GMT: INFO (drv_ssd): (drv_ssd_ee.c:309) ns inventory marking non-free arena elements ...
Feb 28 2017 06:09:24 GMT: INFO (drv_ssd): (drv_ssd_ee.c:333) ns inventory marking roots & sentinels used ...
Feb 28 2017 06:09:24 GMT: INFO (drv_ssd): (drv_ssd_ee.c:354) ns inventory reducing 4096 partitions ...
Feb 28 2017 06:09:25 GMT: INFO (drv_ssd): (drv_ssd_ee.c:267) ns inventory ... reduced  512 partitions, restored 14197455 records, expired 76 records
Feb 28 2017 06:09:26 GMT: INFO (drv_ssd): (drv_ssd_ee.c:267) ns inventory ... reduced 1024 partitions, restored 33381606 records, expired 173 records
Feb 28 2017 06:09:27 GMT: INFO (drv_ssd): (drv_ssd_ee.c:267) ns inventory ... reduced 1536 partitions, restored 48329729 records, expired 241 records
Feb 28 2017 06:09:29 GMT: INFO (drv_ssd): (drv_ssd_ee.c:267) ns inventory ... reduced 2048 partitions, restored 66455381 records, expired 319 records
Feb 28 2017 06:09:30 GMT: INFO (drv_ssd): (drv_ssd_ee.c:267) ns inventory ... reduced 2560 partitions, restored 83531966 records, expired 402 records
Feb 28 2017 06:09:31 GMT: INFO (drv_ssd): (drv_ssd_ee.c:267) ns inventory ... reduced 3072 partitions, restored 100862156 records, expired 507 records
Feb 28 2017 06:09:32 GMT: INFO (drv_ssd): (drv_ssd_ee.c:267) ns inventory ... reduced 3584 partitions, restored 117756873 records, expired 597 records
Feb 28 2017 06:09:34 GMT: INFO (drv_ssd): (drv_ssd_ee.c:377) ns inventory ... reduced 4096 partitions, restored 138797684 records, expired 689 records
Feb 28 2017 06:09:34 GMT: INFO (drv_ssd): (drv_ssd_ee.c:393) ns inventory arena: 1634 free, 138814068 used [138797684 records], 0 lost
Feb 28 2017 06:09:34 GMT: INFO (drv_ssd): (drv_ssd_ee.c:404) ns inventory done resuming storage state
Feb 28 2017 06:09:34 GMT: INFO (drv_ssd): (drv_ssd_ee.c:436) ns inventory done resuming devices, 138797684 records
Feb 28 2017 06:09:34 GMT: INFO (drv_ssd): (drv_ssd.c:1059) ns inventory loading free & defrag queues
Feb 28 2017 06:09:34 GMT: INFO (drv_ssd): (drv_ssd.c:993) /dev/xvdg init defrag profile: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,2,1,4,6,4,10,6,5,14,6,9,3,11,5,5,6,2,6,2,5,7,3,2,2,1,4,4,1,3,0,1,2,0,4
Feb 28 2017 06:09:34 GMT: INFO (drv_ssd): (drv_ssd.c:1083) /dev/xvdg init wblock free-q 732165, defrag-q 147
Feb 28 2017 06:09:34 GMT: INFO (drv_ssd): (drv_ssd.c:2357) ns inventory starting device maintenance threads
Feb 28 2017 06:09:34 GMT: INFO (drv_ssd): (drv_ssd.c:1475) ns inventory starting write worker threads
Feb 28 2017 06:09:34 GMT: INFO (drv_ssd): (drv_ssd.c:910) ns inventory starting defrag threads
Feb 28 2017 06:09:34 GMT: INFO (sindex): (secondary_index.c:2083) Queuing namespace inventory for sindex population 
Feb 28 2017 06:09:54 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (2.2/s), defrag-q 0 defrag-tot 147 (7.3/s) defrag-w-tot 43 (2.2/s)
Feb 28 2017 06:10:14 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:10:34 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:10:54 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:11:14 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:11:34 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:11:54 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:12:14 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:12:16 GMT: INFO (sindex): (secondary_index.c:4871)  Sindex-ticker: ns=inventory si=<all> obj-scanned=500000 si-mem-used=1243003 progress= 0% est-time=44954768 ms
Feb 28 2017 06:12:34 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:12:54 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:13:14 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:13:34 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:13:54 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:14:14 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:14:34 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:14:54 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:15:00 GMT: INFO (sindex): (secondary_index.c:4871)  Sindex-ticker: ns=inventory si=<all> obj-scanned=1000000 si-mem-used=2469711 progress= 0% est-time=44911572 ms
Feb 28 2017 06:15:14 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:15:34 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:15:54 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:16:14 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:16:34 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:16:54 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:17:14 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:17:34 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/xvdg: used 190839957760, contig-free 732268M (732268 wblocks), swb-free 1, w-q 0 w-tot 43 (0.0/s), defrag-q 0 defrag-tot 147 (0.0/s) defrag-w-tot 43 (0.0/s)
Feb 28 2017 06:17:43 GMT: INFO (sindex): (secondary_index.c:4871)  Sindex-ticker: ns=inventory si=<all> obj-scanned=1500000 si-mem-used=3655974 progress= 1% est-time=44788060 ms

#4

Ah, this is correct, the secondary index creation process will read each record from drive.