After reading Reading Microbenchmarks and since the issue was happening again on a different aerospike cluster we have in production, I gathered the benchmarks mentioned in the article.
The following benchmarks seem to be the ones showing higher latency:
q_wait:
q_wait
Oct 27 2015 18:49:58
% > (ms)
slice-to (sec) 1 2 4 8 16 32 64 128 ops/sec
-------------- ------ ------ ------ ------ ------ ------ ------ ------ --------
18:54:58 300 2.93 2.77 2.45 1.93 1.25 0.66 0.26 0.07 1483.8
18:59:58 300 3.13 2.99 2.75 2.32 1.72 1.04 0.35 0.06 1543.0
19:04:58 300 2.48 2.40 2.25 2.00 1.59 1.02 0.37 0.03 1453.5
19:09:58 300 0.73 0.67 0.55 0.35 0.11 0.01 0.00 0.00 1590.4
19:14:58 300 0.59 0.57 0.52 0.44 0.33 0.21 0.09 0.02 1603.3
19:19:58 300 4.86 4.58 4.05 3.15 1.91 0.95 0.37 0.04 1674.5
19:24:58 300 2.08 2.01 1.87 1.63 1.25 0.77 0.28 0.03 1658.4
19:29:58 300 2.04 2.00 1.94 1.80 1.54 1.03 0.39 0.02 1624.7
19:34:59 301 0.08 0.08 0.08 0.07 0.06 0.04 0.02 0.00 1635.9
19:39:59 300 4.89 4.56 3.96 2.92 1.50 0.56 0.23 0.01 1657.3
19:44:59 300 1.86 1.76 1.57 1.25 0.79 0.42 0.14 0.00 1624.3
19:49:59 300 1.36 1.26 1.08 0.76 0.35 0.09 0.03 0.00 1664.5
19:54:59 300 0.98 0.91 0.79 0.58 0.29 0.07 0.01 0.00 1692.6
19:59:59 300 0.43 0.39 0.34 0.22 0.07 0.00 0.00 0.00 1694.0
20:04:59 300 4.06 3.95 3.73 3.35 2.70 1.77 0.62 0.03 1545.8
20:09:59 300 1.26 1.22 1.15 1.03 0.81 0.51 0.19 0.02 1603.4
20:14:59 300 1.07 1.03 0.93 0.77 0.54 0.33 0.14 0.01 1675.9
20:19:59 300 6.41 5.94 5.01 3.43 1.33 0.12 0.01 0.00 1670.2
20:24:59 300 0.36 0.34 0.29 0.21 0.09 0.01 0.00 0.00 1716.1
20:29:59 300 4.82 4.65 4.35 3.79 2.88 1.80 0.84 0.20 1732.8
20:34:59 300 4.53 4.35 4.02 3.39 2.38 1.37 0.61 0.14 1726.8
20:39:59 300 7.99 7.74 7.28 6.45 5.00 3.12 1.37 0.27 1777.0
20:44:59 300 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1757.2
20:49:59 300 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1738.3
20:54:59 300 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1773.1
20:59:59 300 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1755.7
21:04:59 300 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1604.0
21:09:59 300 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1656.3
21:14:59 300 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1676.6
21:19:59 300 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1680.3
-------------- ------ ------ ------ ------ ------ ------ ------ ------ --------
avg 1.96 1.87 1.70 1.39 0.95 0.53 0.21 0.03 1656.0
max 7.99 7.74 7.28 6.45 5.00 3.12 1.37 0.27 1777.0
reads_tree:
Oct 27 2015 18:49:58
% > (ms)
slice-to (sec) 1 2 4 8 16 32 64 128 ops/sec
-------------- ------ ------ ------ ------ ------ ------ ------ ------ --------
18:54:58 300 0.68 0.63 0.56 0.47 0.36 0.17 0.03 0.00 1473.4
18:59:58 300 0.47 0.43 0.38 0.32 0.25 0.12 0.04 0.00 1526.1
19:04:58 300 0.37 0.33 0.29 0.25 0.20 0.11 0.05 0.00 1437.9
19:09:58 300 0.27 0.25 0.23 0.19 0.14 0.06 0.00 0.00 1590.3
19:14:58 300 0.11 0.10 0.09 0.07 0.05 0.03 0.01 0.00 1599.8
19:19:58 300 1.07 1.00 0.91 0.76 0.59 0.26 0.04 0.00 1657.5
19:24:58 300 0.25 0.22 0.19 0.15 0.12 0.07 0.02 0.00 1645.0
19:29:58 300 0.10 0.08 0.07 0.06 0.05 0.04 0.03 0.00 1607.0
19:34:59 301 0.01 0.01 0.01 0.00 0.00 0.00 0.00 0.00 1635.2
19:39:59 300 1.03 0.94 0.83 0.68 0.53 0.23 0.02 0.00 1647.4
19:44:59 300 0.29 0.26 0.23 0.19 0.15 0.08 0.02 0.00 1616.9
19:49:59 300 0.27 0.24 0.21 0.18 0.13 0.06 0.01 0.00 1662.8
19:54:59 300 0.17 0.16 0.14 0.12 0.09 0.04 0.00 0.00 1691.2
19:59:59 300 0.11 0.10 0.09 0.08 0.06 0.02 0.00 0.00 1694.0
20:04:59 300 0.44 0.40 0.35 0.29 0.24 0.15 0.08 0.01 1516.8
20:09:59 300 0.13 0.12 0.10 0.08 0.07 0.04 0.01 0.00 1594.7
20:14:59 300 0.14 0.13 0.11 0.09 0.07 0.04 0.01 0.00 1670.0
20:19:59 300 1.27 1.18 1.07 0.88 0.66 0.29 0.01 0.00 1667.6
20:24:59 300 0.06 0.06 0.05 0.04 0.03 0.01 0.00 0.00 1715.9
20:29:59 300 0.45 0.40 0.35 0.29 0.23 0.13 0.04 0.01 1700.0
20:34:59 300 0.44 0.39 0.34 0.28 0.22 0.12 0.03 0.01 1701.8
20:39:59 300 0.61 0.54 0.47 0.39 0.31 0.19 0.08 0.01 1718.5
20:44:59 300 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 1757.2
20:49:59 300 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1738.3
20:54:59 300 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1773.1
20:59:59 300 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1755.7
21:04:59 300 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 1604.0
21:09:59 300 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1656.3
21:14:59 300 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 1676.6
21:19:59 300 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1680.3
-------------- ------ ------ ------ ------ ------ ------ ------ ------ --------
avg 0.29 0.27 0.24 0.20 0.15 0.08 0.02 0.00 1647.0
max 1.27 1.18 1.07 0.88 0.66 0.29 0.08 0.01 1773.1
reads_storage_read:
Oct 27 2015 18:49:58
% > (ms)
slice-to (sec) 1 2 4 8 16 32 64 128 ops/sec
-------------- ------ ------ ------ ------ ------ ------ ------ ------ --------
18:54:58 300 8.71 7.95 6.82 5.23 4.05 1.83 0.36 0.05 1018.5
18:59:58 300 6.35 5.75 4.92 3.85 2.98 1.53 0.56 0.04 1054.6
19:04:58 300 4.49 4.04 3.42 2.71 2.15 1.22 0.62 0.05 979.0
19:09:58 300 2.99 2.74 2.40 1.81 1.36 0.56 0.00 0.00 1086.9
19:14:58 300 1.30 1.13 0.92 0.68 0.49 0.26 0.08 0.02 1094.9
19:19:58 300 11.91 10.99 9.69 7.52 5.79 2.75 0.42 0.05 1136.4
19:24:58 300 3.19 2.82 2.33 1.80 1.43 0.81 0.31 0.03 1141.6
19:29:58 300 1.52 1.25 0.99 0.86 0.75 0.64 0.51 0.02 1119.4
19:34:59 301 0.19 0.13 0.09 0.06 0.03 0.02 0.02 0.00 1127.5
19:39:59 300 15.54 14.02 11.97 9.07 6.77 3.07 0.24 0.04 1135.3
19:44:59 300 4.34 3.93 3.43 2.67 2.03 1.01 0.22 0.00 1118.7
19:49:59 300 4.09 3.73 3.23 2.48 1.82 0.82 0.05 0.00 1150.4
19:54:59 300 2.66 2.43 2.12 1.59 1.20 0.56 0.03 0.00 1161.9
19:59:59 300 1.62 1.45 1.26 0.93 0.67 0.29 0.00 0.00 1174.9
20:04:59 300 6.30 5.51 4.51 3.63 2.96 1.92 1.02 0.05 1042.3
20:09:59 300 1.91 1.64 1.36 1.09 0.85 0.55 0.25 0.02 1112.1
20:14:59 300 2.21 1.98 1.67 1.28 0.97 0.49 0.14 0.01 1165.2
20:19:59 300 19.50 18.06 16.01 12.27 9.20 4.17 0.05 0.00 1159.0
20:24:59 300 1.09 0.97 0.83 0.61 0.45 0.20 0.00 0.00 1195.6
20:29:59 300 7.31 6.53 5.53 4.34 3.35 1.95 0.64 0.12 1194.5
20:34:59 300 7.02 6.30 5.34 4.15 3.25 1.85 0.50 0.09 1195.9
20:39:59 300 10.33 9.14 7.56 5.96 4.65 2.94 1.22 0.17 1208.9
20:44:59 300 0.16 0.11 0.05 0.03 0.00 0.00 0.00 0.00 1239.0
20:49:59 300 0.14 0.09 0.04 0.02 0.00 0.00 0.00 0.00 1221.6
20:54:59 300 0.11 0.07 0.03 0.01 0.00 0.00 0.00 0.00 1243.4
20:59:59 300 0.15 0.10 0.06 0.03 0.01 0.00 0.00 0.00 1239.8
21:04:59 300 0.14 0.08 0.04 0.02 0.01 0.00 0.00 0.00 1117.4
21:09:59 300 0.13 0.08 0.04 0.02 0.00 0.00 0.00 0.00 1155.8
21:14:59 300 0.15 0.09 0.05 0.03 0.01 0.00 0.00 0.00 1173.8
21:19:59 300 0.13 0.08 0.04 0.02 0.01 0.00 0.00 0.00 1197.4
-------------- ------ ------ ------ ------ ------ ------ ------ ------ --------
avg 4.19 3.77 3.23 2.49 1.91 0.98 0.24 0.03 1145.0
max 19.50 18.06 16.01 12.27 9.20 4.17 1.22 0.17 1243.4
SSD_READ_0 /dev/sdb
Oct 27 2015 18:49:58
% > (ms)
slice-to (sec) 1 2 4 8 16 32 64 128 ops/sec
-------------- ------ ------ ------ ------ ------ ------ ------ ------ --------
18:54:58 300 8.99 8.20 6.99 5.33 4.14 1.99 0.46 0.08 507.9
18:59:58 300 6.53 5.91 5.03 3.91 2.99 1.57 0.56 0.03 521.7
19:04:58 300 4.56 4.08 3.45 2.76 2.14 1.22 0.66 0.05 486.0
19:09:58 300 2.99 2.73 2.40 1.83 1.37 0.57 0.00 0.00 541.3
19:14:58 300 1.31 1.15 0.92 0.69 0.51 0.28 0.10 0.01 546.2
19:19:58 300 11.82 10.87 9.56 7.45 5.77 2.73 0.48 0.06 569.4
19:24:58 300 3.21 2.80 2.27 1.75 1.34 0.73 0.31 0.01 577.0
19:29:58 300 1.45 1.18 0.93 0.81 0.74 0.59 0.47 0.02 565.8
19:34:59 301 0.18 0.13 0.08 0.05 0.03 0.02 0.02 0.00 562.3
19:39:59 300 14.65 12.90 10.57 7.61 5.38 2.19 0.16 0.02 566.0
19:44:59 300 4.16 3.72 3.21 2.46 1.86 0.88 0.27 0.00 556.9
19:49:59 300 3.85 3.45 2.92 2.21 1.58 0.65 0.05 0.00 574.1
19:54:59 300 2.57 2.32 2.00 1.45 1.08 0.44 0.04 0.00 581.7
19:59:59 300 1.53 1.36 1.17 0.85 0.61 0.22 0.00 0.00 590.4
20:04:59 300 5.69 4.73 3.61 2.75 2.15 1.29 0.73 0.01 521.4
20:09:59 300 1.78 1.50 1.23 0.99 0.77 0.49 0.28 0.02 560.0
20:14:59 300 2.12 1.86 1.55 1.17 0.88 0.43 0.14 0.01 581.4
20:19:59 300 19.08 17.53 15.33 11.52 8.48 3.56 0.06 0.00 583.2
20:24:59 300 1.09 0.97 0.82 0.61 0.45 0.19 0.01 0.00 603.0
20:29:59 300 7.49 6.66 5.61 4.32 3.25 1.84 0.65 0.18 595.1
20:34:59 300 7.45 6.68 5.62 4.34 3.39 1.98 0.49 0.12 600.8
20:39:59 300 10.89 9.76 8.11 6.47 5.07 3.22 1.36 0.25 604.2
20:44:59 300 0.16 0.10 0.05 0.03 0.01 0.00 0.00 0.00 623.5
20:49:59 300 0.12 0.08 0.04 0.01 0.00 0.00 0.00 0.00 616.1
20:54:59 300 0.14 0.09 0.04 0.02 0.00 0.00 0.00 0.00 624.1
20:59:59 300 0.17 0.12 0.07 0.04 0.00 0.00 0.00 0.00 617.5
21:04:59 300 0.15 0.09 0.05 0.03 0.01 0.00 0.00 0.00 562.9
21:09:59 300 0.13 0.09 0.05 0.02 0.00 0.00 0.00 0.00 581.0
21:14:59 300 0.15 0.09 0.05 0.03 0.01 0.00 0.00 0.00 592.7
21:19:59 300 0.13 0.07 0.03 0.02 0.01 0.00 0.00 0.00 602.7
-------------- ------ ------ ------ ------ ------ ------ ------ ------ --------
avg 4.15 3.71 3.13 2.38 1.80 0.90 0.24 0.03 573.0
max 19.08 17.53 15.33 11.52 8.48 3.56 1.36 0.25 624.1
And SSD_READ_1 /dev/sdc
Oct 27 2015 18:49:58
% > (ms)
slice-to (sec) 1 2 4 8 16 32 64 128 ops/sec
-------------- ------ ------ ------ ------ ------ ------ ------ ------ --------
18:54:58 300 8.43 7.70 6.65 5.13 3.96 1.67 0.26 0.02 510.2
18:59:58 300 6.16 5.59 4.82 3.80 2.96 1.50 0.55 0.06 532.5
19:04:58 300 4.43 4.00 3.40 2.67 2.16 1.22 0.58 0.05 492.5
19:09:58 300 3.00 2.75 2.41 1.80 1.36 0.56 0.00 0.00 544.4
19:14:58 300 1.28 1.12 0.91 0.67 0.48 0.25 0.07 0.03 548.1
19:19:58 300 12.01 11.12 9.84 7.60 5.83 2.77 0.37 0.03 566.0
19:24:58 300 3.17 2.84 2.40 1.86 1.52 0.89 0.31 0.06 563.7
19:29:58 300 1.59 1.32 1.06 0.90 0.77 0.69 0.55 0.03 553.1
19:34:59 301 0.19 0.13 0.09 0.07 0.03 0.03 0.02 0.00 564.4
19:39:59 300 16.43 15.14 13.37 10.52 8.16 3.95 0.31 0.06 568.6
19:44:59 300 4.52 4.15 3.66 2.88 2.19 1.15 0.18 0.00 560.9
19:49:59 300 4.33 4.01 3.55 2.74 2.07 0.99 0.04 0.00 575.8
19:54:59 300 2.76 2.53 2.25 1.74 1.32 0.68 0.02 0.00 579.7
19:59:59 300 1.71 1.54 1.34 1.01 0.74 0.36 0.00 0.00 584.0
20:04:59 300 6.92 6.29 5.41 4.52 3.77 2.54 1.31 0.09 519.8
20:09:59 300 2.04 1.78 1.49 1.19 0.95 0.61 0.23 0.02 551.1
20:14:59 300 2.30 2.09 1.78 1.38 1.06 0.56 0.14 0.01 582.8
20:19:59 300 19.94 18.61 16.71 13.04 9.94 4.79 0.04 0.00 574.9
20:24:59 300 1.08 0.96 0.84 0.61 0.44 0.21 0.00 0.00 592.0
20:29:59 300 7.13 6.40 5.46 4.37 3.45 2.05 0.63 0.07 598.7
20:34:59 300 6.61 5.92 5.06 3.96 3.12 1.73 0.51 0.07 594.3
20:39:59 300 9.77 8.53 7.02 5.46 4.23 2.66 1.07 0.10 603.6
20:44:59 300 0.17 0.11 0.05 0.02 0.00 0.00 0.00 0.00 614.8
20:49:59 300 0.15 0.10 0.05 0.02 0.00 0.00 0.00 0.00 604.7
20:54:59 300 0.08 0.05 0.02 0.01 0.00 0.00 0.00 0.00 618.6
20:59:59 300 0.13 0.08 0.05 0.03 0.01 0.00 0.00 0.00 621.7
21:04:59 300 0.12 0.07 0.04 0.02 0.00 0.00 0.00 0.00 553.8
21:09:59 300 0.13 0.07 0.04 0.02 0.00 0.00 0.00 0.00 574.2
21:14:59 300 0.14 0.09 0.05 0.02 0.00 0.00 0.00 0.00 580.4
21:19:59 300 0.13 0.09 0.05 0.02 0.01 0.00 0.00 0.00 593.9
-------------- ------ ------ ------ ------ ------ ------ ------ ------ --------
avg 4.23 3.84 3.33 2.60 2.02 1.06 0.24 0.02 570.0
max 19.94 18.61 16.71 13.04 9.94 4.79 1.31 0.10 621.7
I guess what the above information tells us is that the latency is attributed to the two SSD drives of that instance being slower than normal, which I guess is a question to AWS.
During the time that the latency is high, we also notice a high CPU load on that aerospike node and higher number of client connections to it. I don’t know if that can tell us why the reads are slow or explain why the SSDs might show slower reads (node gets overloaded).
Are there any other histograms I can gather to see if higher CPU load or higher number of connections can result to a degradation in read performance?
Apart from a bad AWS EC2 instance ephemeral drive, are there any cluster or node level operations (re-balancing of the keys, defrag of SSD volumes) that could explain a higher CPU load? What metrics/histograms I should be looking at, to catch issues during those operations? The problem seems to be happening at random times for about 1-2 hours each time and then disappear, that’s why I am asking for any cluster/node level operations that could be happening at specific intervals.
Note: We haven’t transitioned to 8 transaction queues per thread, yet.
Thank you again for your assistance with these issues, and sorry for the lengthy post.