Idle node reporting high disk utilization in all-flash

All flash high disk utilization without active transactions

Problem Description

In a cluster running an “All Flash” namespace (index-type flash), the nodes report high disk utilization and mostly reads on the index device while there are no transactions being processed.

Explanation

In this example, nvme0n1 is the device holding the mount for the index and nvme1n1 is the device used to store the data (storage-engine).

['lsblk']
NAME       MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvda       202:0   0    8G 0 disk
└─xvda1    202:1   0    8G 0 part /
nvme0n1    259:0   0  1.7T 0 disk /mnt/aerospike
nvme1n1    259:1   0  1.7T 0 disk
├─nvme1n1p1 259:6   0 353.9G 0 part
├─nvme1n1p2 259:7   0 353.9G 0 part
├─nvme1n1p3 259:8   0 353.9G 0 part
└─nvme1n1p4 259:9   0 353.9G 0 part

While there is no transactions being processed, the index device still reports high utilization due to consistent reads.

'iostat -y -x 5 4']
Linux 4.14.186-146.268.amzn2.x86_64 (ip-172-17-76-213)  08/14/2020      _x86_64_        (16 CPU)
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.05    0.00    0.33    1.04    0.23   98.35
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.20    0.00    0.40     0.00    21.60   108.00     0.00    0.00    0.00    0.00   0.00   0.00
nvme0n1           0.00     0.00 17077.40    0.00 68310.40     0.00     8.00     0.91    0.10    0.10    0.00   0.05  90.72  <<<
nvme1n1           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.05    0.00    0.12    1.31    0.24   98.28
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00    0.00    0.40     0.00     2.40    12.00     0.00    0.00    0.00    0.00   0.00   0.00
nvme0n1           0.00     0.00 17010.40    0.00 68040.80     0.00     8.00     0.90    0.10    0.10    0.00   0.05  89.52  <<<
nvme1n1           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.06    0.00    0.36    0.83    0.24   98.50
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.20    0.20    1.00     3.20    10.30    22.50     0.00    0.00    0.00    0.00   0.00   0.00
nvme0n1           0.00     0.00 16984.80    0.00 67939.20     0.00     8.00     0.91    0.10    0.10    0.00   0.05  90.64  <<<
nvme1n1           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.05    0.00    0.12    1.65    0.24   97.94
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00    0.80    0.20     4.00     0.80     9.60     0.00    3.20    4.00    0.00   0.80   0.08
nvme0n1           0.00     0.00 17060.80    0.00 68243.20     0.00     8.00     0.90    0.10    0.10    0.00   0.05  89.84  <<<
nvme1n1           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Aerospike log points to an nsup cycle taking approximatively 1.7 hours (in the last cycle). The nsup thread reduces the primary index in order to expire records (if any).

Aug 14 2020 06:41:50 GMT: INFO (info): (ticker.c:423) {test} objects: all 806588827 master 402826192 prole 403762635 non-replica 0
...
Aug 14 2020 06:41:55 GMT: INFO (nsup): (nsup.c:814) {test} nsup-done: non-expirable 806588827 expired (0,0) evicted (0,0) evict-ttl 0 total-ms 6183933
...
Aug 14 2020 06:42:00 GMT: INFO (info): (ticker.c:423) {test} objects: all 806588827 master 402826192 prole 403762635 non-replica 0

Solution

In the example above, there are actually no expirable records and the nsup thread is therefore not needed and can be disabled. Disabling the nsup thread can be done dynamically through the nsup-period configuration parameter. For example, for the test namespace:

asinfo -v "set-config:context=namespace;id=test;nsup-period=0"

As of version 4.9, nsup is disabled (the default for nsup-period is 0 for versions 4.9 and above). If nsup-period is dynamically set to zero while nsup is in a middle of a cycle, nsup will finish its current cycle and then become dormant.

Keywords

NSUP DISK UTILIZATION ALL-FLASH

Timestamp

August 2020

© 2015 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.