Read/write performance spikes


#1

We use aerospike-server-community 3.6.0-1 on Ubuntu Server 12.04 (RAM: 48G, CPU: 2 x Intel Xeon X5650) and we got some read/write performance spikes as you can see at screenshot https://gyazo.com/cc96722695ea64e0bbfac41c45c5591b

We investigated the issue with asloglatency

asloglatency -h reads
    10:45:47    10   2.46   1.18   0.00   1023.9
    10:45:57    10   4.93   2.34   0.00   1810.4
    10:46:07    10   6.33   2.86   0.00   3013.8
    10:46:17    10   7.88   3.84   0.04   4711.5
    10:46:27    10   8.25   3.64   0.00   5583.8
    10:46:37    10  47.37  43.80  39.64   4163.2
    10:46:47    10   7.75   3.53   0.00   5008.5
    10:46:57    10   6.13   2.82   0.00   2882.8
    10:47:07    10   1.99   0.94   0.00   1185.6
    10:47:17    10   6.88   3.52   0.08   3738.9
    10:47:27    10   7.24   3.38   0.00   4865.8
    10:47:37    10   7.95   3.43   0.00   5490.7
    10:47:47    10  39.72  35.68  31.48   4427.7
    10:47:57    10   7.38   3.31   0.00   5087.1
    10:48:07    10   6.17   2.62   0.00   3650.3
    10:48:17    10   2.76   1.24   0.00   1358.6
    10:48:27    10   7.89   3.67   0.00   5340.3
    10:48:37    10   7.21   3.30   0.00   5508.1

asloglatency -h reads_storage_read
    10:45:47    10   2.43   1.22   0.00    968.1
    10:45:57    10   5.07   2.42   0.00   1709.3
    10:46:07    10   6.52   3.00   0.00   2820.2
    10:46:17    10   8.06   3.88   0.04   4399.7
    10:46:27    10   8.52   3.80   0.00   5238.0
    10:46:37    10  43.97   4.67   1.06   3999.9
    10:46:47    10   7.86   3.63   0.00   4806.4
    10:46:57    10   6.27   2.93   0.00   2743.5
    10:47:07    10   2.04   0.98   0.00   1117.7
    10:47:17    10   6.85   3.52   0.08   3523.2
    10:47:27    10   7.51   3.54   0.00   4549.7
    10:47:37    10   8.31   3.62   0.00   5123.6
    10:47:47    10  36.98   5.16   0.94   4209.1
    10:47:57    10   7.52   3.41   0.00   4878.0
    10:48:07    10   6.26   2.69   0.00   3478.1
    10:48:17    10   2.89   1.33   0.00   1271.9
    10:48:27    10   8.17   3.84   0.00   5019.4
    10:48:37    10   7.49   3.46   0.00   5150.5

it looks like the problem related to device (ssd) performance.

As you can see “reads_storage_read” latency slowdown mainly over 1 ms, but “read” latency slowdown over 64ms too. What can be a reason?

Bellow I provided configuration and top output. Should I provide more details?

Our server confguration:

service {                                                                                    
        user root                                                                            
        group root                                                                           
        paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.                                                                             
        pidfile /var/run/aerospike/asd.pid                                                   
        service-threads 24                                                                   
        transaction-queues 24                                                                
        transaction-threads-per-queue 4                                                      
        proto-fd-max 15000                                                                   
}

namespace ssd {
        replication-factor 1
        memory-size 30G
        default-ttl 30d # 30 days, use 0 to never expire/evict.
        high-water-memory-pct 90
        high-water-disk-pct 90
       storage-engine device {
                device /dev/sdb
                device /dev/sdc
                # The 2 lines below optimize for SSD.
                scheduler-mode noop
                write-block-size 128K    # adjust block size to make it efficient for SSDs
                defrag-lwm-pct 54
                defrag-startup-minimum 5
                #data-in-memory true # Store data in memory in addition to file.
       }
}
namespace devices {
        replication-factor 1
        memory-size 2G
        default-ttl 0 # 30 days, use 0 to never expire/evict.
        high-water-memory-pct 99
        high-water-disk-pct 99
       storage-engine device {
                file /opt/aerospike/data/devices.dat
                data-in-memory true # Store data in memory in addition to file.
       }
}

Server top output:

top - 14:08:35 up 91 days, 23:16,  4 users,  load average: 7.92, 10.27, 11.64
Tasks: 305 total,   2 running, 303 sleeping,   0 stopped,   0 zombie
Cpu(s):  7.9%us,  3.8%sy,  0.0%ni, 73.2%id, 14.2%wa,  0.0%hi,  1.0%si,  0.0%st
Mem:  49451000k total, 45280444k used,  4170556k free,   173308k buffers
Swap: 33542140k total,    18520k used, 33523620k free, 15225032k cached

#2

Hi there, based on your private reply, I’d say that this might be a hardware-setup thingy. We struggled from similar issues with our first deployment on rented root servers (however, over-provisioning might come to the rescue, see below). The part numbers MTFDDAK256MAY-1 and MTFDDAK256MAM-1 you send me seem to belong to Micron M550 and C400 and you added that they are plugged into an HP Smart Array controller. Not an expert on the card topic, but AS recommends to ‘bypass’ any raid logic / “smart I/O” management because it might actually harm more than it does good. Might be worth a try to stick the drives into mobo ports, if possible (sata I guess?) or try other settings with the array controller.

But I would also focus on the drives itself. Aerospike uses I/O in a very unique way and therefore provides it’s own drive benchmarking solution to mimic this usage. I’d strongly guess that the drives itself introduce the latency. If you are in production, there is a way to find out where latency comes from. Check out this resource: Reading Microbenchmarks

In general, AS recommends to benchmark any hardware setup before going into production (benchmark deletes all data on drives!). Especially how the drives react under sustained load (24 hours+). You can find more about that in the docs: http://www.aerospike.com/docs/operations/plan/ssd/ssd_certification.html

It took us a long time on understanding why AS requires some better (enterprise-grade) SSDs to run fine. This has a lot to do with architecture of ssds, what technique controllers use to manage the NAND and so on. The short story: What you see as spikes is most likely the controller of the ssd halting all I/O for a short maintenance period that is required to allow for new writes to happen.

We are mainly using the Samsung 845DC PRO 400GB (budget enterprise drive) which shows similar but far less significant spikes. Example from ACT (microsecond precision):

data is act version 3.0
        trans                                                                                                      device
        %>(us)                                                                                                     %>(us)
slice        1      2      4      8     16     32     64    128    256    512   1024   2048   4096   8192  16384        1      2      4      8     16     32     64    128    256    512   1024   2048   4096   8192  16384
-----   ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------   ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
    1   100.00 100.00 100.00 100.00 100.00 100.00 100.00  44.76  29.62   9.22   5.45   0.95   0.00   0.00   0.00   100.00 100.00 100.00 100.00 100.00 100.00 100.00  43.04  29.30   8.89   5.29   0.94   0.00   0.00   0.00
    2   100.00 100.00 100.00 100.00 100.00 100.00 100.00  75.72  67.60  55.65  47.44  31.54  13.79   2.74   0.00   100.00 100.00 100.00 100.00 100.00 100.00 100.00  74.88  67.37  55.02  43.02  19.20   1.49   0.00   0.00
    3   100.00 100.00 100.00 100.00 100.00 100.00 100.00  45.30  30.44  10.27   5.92   1.20   0.00   0.00   0.00   100.00 100.00 100.00 100.00 100.00 100.00 100.00  43.68  30.23   9.86   5.64   1.19   0.00   0.00   0.00
    4   100.00 100.00 100.00 100.00 100.00 100.00 100.00  45.65  30.80  10.78   6.33   1.31   0.00   0.00   0.00   100.00 100.00 100.00 100.00 100.00 100.00 100.00  43.87  30.45  10.37   5.97   1.30   0.00   0.00   0.00
    5   100.00 100.00 100.00 100.00 100.00 100.00 100.00  45.19  29.85   9.83   5.77   1.18   0.00   0.00   0.00   100.00 100.00 100.00 100.00 100.00 100.00 100.00  43.49  29.56   9.37   5.48   1.17   0.00   0.00   0.00
    6   100.00 100.00 100.00 100.00 100.00 100.00 100.00  52.78  39.19  20.83  15.35   6.64   1.43   0.00   0.00   100.00 100.00 100.00 100.00 100.00 100.00 100.00  51.20  38.93  20.46  14.32   4.41   0.24   0.00   0.00
    7   100.00 100.00 100.00 100.00 100.00 100.00 100.00  52.70  38.73  19.58  13.09   4.46   0.14   0.00   0.00   100.00 100.00 100.00 100.00 100.00 100.00 100.00  51.12  38.43  19.12  12.21   3.49   0.00   0.00   0.00
    8   100.00 100.00 100.00 100.00 100.00 100.00 100.00  45.92  30.50  10.25   6.02   1.22   0.00   0.00   0.00   100.00 100.00 100.00 100.00 100.00 100.00 100.00  44.07  30.24   9.81   5.82   1.21   0.00   0.00   0.00
    9   100.00 100.00 100.00 100.00 100.00 100.00 100.00  45.11  30.22  10.44   6.05   1.14   0.00   0.00   0.00   100.00 100.00 100.00 100.00 100.00 100.00 100.00  43.45  29.96  10.07   5.78   1.12   0.00   0.00   0.00
   10   100.00 100.00 100.00 100.00 100.00 100.00 100.00  45.11  29.52   9.47   5.42   1.01   0.00   0.00   0.00   100.00 100.00 100.00 100.00 100.00 100.00 100.00  43.33  29.24   9.17   5.29   0.99   0.00   0.00   0.00
   11   100.00 100.00 100.00 100.00 100.00 100.00 100.00  71.38  60.39  45.08  33.93  15.62   4.15   0.08   0.00   100.00 100.00 100.00 100.00 100.00 100.00 100.00  70.36  60.20  44.41  31.24   9.90   0.80   0.00   0.00
   12   100.00 100.00 100.00 100.00 100.00 100.00 100.00  50.57  36.72  17.74  12.36   5.17   0.62   0.00   0.00   100.00 100.00 100.00 100.00 100.00 100.00 100.00  49.01  36.51  17.22  11.80   3.76   0.06   0.00   0.00
   13   100.00 100.00 100.00 100.00 100.00 100.00 100.00  45.41  30.84  10.57   6.10   1.19   0.00   0.00   0.00   100.00 100.00 100.00 100.00 100.00 100.00 100.00  43.78  30.57  10.17   5.81   1.18   0.00   0.00   0.00

Notice the spikes every 4-5 seconds… However, this is behavior under massive load (14x) and the spikes are never causing latency > 16ms. The peaks will appear less frequently (up to once every 100 seconds) the lower the write load is. But it’s clear that some (adapting) algorithm on the device is working there and we made sure that it’s not caused by thermal throttling or something similar. On the Intel DC S3710 we have, there are no such spikes - Intel seems to do a better job at hiding such maintenance actions.

The good news is: you can still increase performance of your drives if you set them up with proper over-provisioning. 256 GB capacity hint that there is hardly any OP by manufacturer. Enterprise drives would come at 200GB capacity, while still utilizing the same 256 GiB of flash chips inside. You might loose these 20-30% capacity on your drive but should give it a try (instructions to do so on the ACT page linked above). Over-provisioning helped us to decrease the latency averages by a good amount and we’ll most likely deploy with with 296GiB on 512GiB chips (~40% OP all together) because we prefer performance over capacity.

Cheers, Manuel