Latency during batch read

query

#1

We’re doing batch read on 3 node cluster. Normally, we process 22k/sec, and 95 percentile for this is 8ms. While processing around 70k/sec batch reads, latency reached to 40ms. CPU utilisation of all nodes were 5%-6%. Size of batch reads are two or three keys. We are also writing on some keys parallelly (max 1k/sec)

Could anyone explain, why latency reached significantly.

  1. is it related to new connections.

What i’m missing here.


#2

Whats the server version? Client type/version? Policy definitions? Any latency showing up in the histograms? Any warnings in the aerospike logs?


#3

Aerospike Server Version : Aerospike Community Edition build 3.15.0.1

Client : Node and version 3.0.2

Policy : Default Read and Write Policy

Warning During Latency: Sep 03 2018 04:23:57 GMT: WARNING (hlc): (hlc.c:564) HLC jumped by 76515 milliseconds with message from bb9740882239106. Current physical clock:1535948637708 Current HLC:1535948713973 Incoming HLC:1535948714223 Tolerable skew:1000 ms
Sep 03 2018 04:23:57 GMT: WARNING (hlc): (hlc.c:564) HLC jumped by 76514 milliseconds with message from bb9740882239106. Current physical clock:1535948637958 Current HLC:1535948714223 Incoming HLC:1535948714472 Tolerable skew:1000 ms
Sep 03 2018 04:23:58 GMT: WARNING (hlc): (hlc.c:564) HLC jumped by 76515 milliseconds with message from bb9740882239106. Current physical clock:1535948638208 Current HLC:1535948714472 Incoming HLC:1535948714723 Tolerable skew:1000 ms
Sep 03 2018 04:23:58 GMT: WARNING (hlc): (hlc.c:564) HLC jumped by 76515 milliseconds with message from bb9740882239106. Current physical clock:1535948638458 Current HLC:1535948714723 Incoming HLC:1535948714973 Tolerable skew:1000 ms
Sep 03 2018 04:23:58 GMT: WARNING (hlc): (hlc.c:564) HLC jumped by 76514 milliseconds with message from bb9740882239106. Current physical clock:1535948638709 Current HLC:1535948714973 Incoming HLC:1535948715223 Tolerable skew:1000 ms
Sep 03 2018 04:23:58 GMT: WARNING (hlc): (hlc.c:564) HLC jumped by 76515 milliseconds with message from bb9740882239106. Current physical clock:1535948638959 Current HLC:1535948715223 Incoming HLC:1535948715474 Tolerable skew:1000 ms

But above warning is logged down all the time even with low latency. The main problem we are facing only during increase in requests.


#4

Those warnings are benign and can be muted and should be unrelated to the issue you are bringing up.

asinfo -v "log-set:id=0;hlc=critical"

Regarding your latency issue, it seems the system is not able to handle the load… the bottle neck wouldn’t then be on CPU. This would require looking further into the details and potentially enabling batch benchmark (look at the details on the latency monitoring page).