Frequent Aerospike Client Timeouts at 1000 RPS

Hello Everyone,

Need some guidance on how to tune Aerospike to handle high volume traffic . In our application , we are using aerospike community edition ( 5.6.0.4) and currently we are facing frequent aerospike client timeout issues with 1000 RPS and need you assistance in tuning the aerospike configurations to eliminate these timeouts. Moving forward, we intend to scale up the traffic to ~20-30K RPS.

Please help.

Below is the error message we are seeing for Aerospike Client timeouts:

{“request_id” : “bf8b0c35-1887-4587-b3c3-0f8fe09f967b”,“level” : “ERROR”,“time” : “2021-07-26T18:41:41.419Z”,“thread” : “http-nio-10300-exec-107”,“class” : “AerospikeKeyValueStoreWrapper”,“line” : “”,“message” : "Aerospike Exception: | Exception : class com.aerospike.client.AerospikeException$Timeout | Exception message : Client timeout: iteration=1 connect=0 socket=100 total=110 maxRetries=1 node=BB96E98611F0792 xx.xxx.xxx.x 3000 inDoubt=false | StackTrace : com.aerospike.client.command.SyncCommand.executeCommand(SyncCommand.java:242)____com.aerospike.client.command.SyncCommand.execute(SyncCommand.java:64)____com.aerospike.client.AerospikeClient.get(AerospikeClient.java:894).

For reference below are the current Aerospike server and Client configurations:

Aerospike Server : ( 3 node cluster)

  • Aerospike Community Edition
    
  • build_time:  Sun May 23 04:25:48 UTC 2021
    
  • build_os: debian10
    
  • build: 5.6.0.4
    
  • version: Aerospike Community Edition build 5.6.0.4
    
  • Config: 
        service-threads 6
        proto-fd-max 100000
        proto-fd-idle-ms 60000
    

Aerospike Client : ( Java Client)

  • Client Version: 5.1.4
  • Configs: “policy” : { “sendKey” : false, “totalTimeout”: 110, “socketTimeout”: 100, “maxRetries”: 1, “sleepBetweenRetries”: 10 }, “write_policy” : { “expiration” : 86400 }, “scan_policy” : { “concurrentNodes”: true, “includeBinData”: true }

I suggest looking at the server side logs (especially on the BB96E98611F0792 node) and potentially turning on read and storage (if data not in memory) benchmarks to figure out where the latency is coming from and if it is coming from the server.

Side note: the client policy wouldn’t allow any retries the way it is set since after the socketTimeout has elapsed, sleeping for 10ms (sleepBetweenRetries) would get you to the 110ms configured totalTimeout and would not allow for any retries (despite the maxRetries being set to 1).

Thanks Meher!!. These errors are coming prominently across all nodes of the cluster.

Corrected the client policies config.

  • Configs: “policy” : { “sendKey” : false, “totalTimeout”: 220, “socketTimeout”: 100, “maxRetries”: 1, “sleepBetweenRetries”: 10 }, “write_policy” : { “expiration” : 86400 }, “scan_policy” : { “concurrentNodes”: true, “includeBinData”: true }

As far as data is concerned, we are using in memory cache. Further looked at the aerospike histogram logs and if I am reading correctly , looks like for good number of samples write operation is taking higher than our specified socket timeout of 100 ms.

Also, for fds proto ticker, slightly confused with the “client_connections_opened”. This number shows pretty huge values. Do we need to set proto-fd-max according to this or should we consider “client_connections” metrics for that?

Can you please have a look at following aerospike server logs and help me out in deriving the correct inference ?

Logs -

Jul 29 2021 11:43:37 GMT: INFO (info): (ticker.c:169) NODE-ID bb922f32dd7749e CLUSTER-SIZE 3
Jul 29 2021 11:43:37 GMT: INFO (info): (ticker.c:251)    cluster-clock: skew-ms 0
Jul 29 2021 11:43:37 GMT: INFO (info): (ticker.c:274)    system: total-cpu-pct 41 user-cpu-pct 23 kernel-cpu-pct 18 free-mem-kbytes 56386784 free-mem-pct 78
Jul 29 2021 11:43:37 GMT: INFO (info): (ticker.c:297)    process: cpu-pct 32 threads (7,61,48,48) heap-kbytes (10342975,10353416,10747904) heap-efficiency-pct 96.2
Jul 29 2021 11:43:37 GMT: INFO (info): (ticker.c:308)    in-progress: info-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0
Jul 29 2021 11:43:37 GMT: INFO (info): (ticker.c:331)    fds: proto (1697,706115,704418) heartbeat (0,0,0) fabric (48,108,60)
Jul 29 2021 11:43:37 GMT: INFO (info): (ticker.c:339)    heartbeat-received: self 123839 foreign 246843
Jul 29 2021 11:43:37 GMT: INFO (info): (ticker.c:369)    fabric-bytes-per-second: bulk (0,0) ctrl (0,0) meta (0,0) rw (245723,257597)
Jul 29 2021 11:43:37 GMT: INFO (info): (ticker.c:432) {oauth} objects: all 7072760 master 3460574 prole 3612186 non-replica 0
Jul 29 2021 11:43:37 GMT: INFO (info): (ticker.c:494) {oauth} migrations: complete
Jul 29 2021 11:43:37 GMT: INFO (info): (ticker.c:513) {oauth} memory-usage: total-bytes 8298615387 index-bytes 452656640 set-index-bytes 0 sindex-bytes 0 data-bytes 7845958747 used-pct 24.15
Jul 29 2021 11:43:37 GMT: INFO (info): (ticker.c:649) {oauth} client: tsvc (0,35) proxy (0,0,0) read (48263504,0,0,30131499,0) write (4930995,0,41,0) delete (1288948,0,1,12477,0) udf (0,0,0,0) lang (0,0,0,0)
Jul 29 2021 11:43:37 GMT: INFO (info): (hist.c:321) histogram dump: {oauth}-read (78395003 total) msec
Jul 29 2021 11:43:37 GMT: INFO (info): (hist.c:331)  (00: 0078389966) (01: 0000002959) (02: 0000000751) (03: 0000000187)
Jul 29 2021 11:43:37 GMT: INFO (info): (hist.c:331)  (04: 0000000027) (05: 0000000883) (06: 0000000206) (07: 0000000024)
Jul 29 2021 11:43:37 GMT: INFO (info): (hist.c:321) histogram dump: {oauth}-write (4930995 total) msec
Jul 29 2021 11:43:37 GMT: INFO (info): (hist.c:331)  (00: 0004516409) (01: 0000044409) (02: 0000023131) (03: 0000026861)
Jul 29 2021 11:43:37 GMT: INFO (info): (hist.c:331)  (04: 0000043738) (05: 0000050332) (06: 0000055364) (07: 0000066219)
Jul 29 2021 11:43:37 GMT: INFO (info): (hist.c:331)  (08: 0000061671) (09: 0000038364) (10: 0000004473) (11: 0000000024)
Jul 29 2021 11:43:47 GMT: INFO (info): (ticker.c:169) NODE-ID bb922f32dd7749e CLUSTER-SIZE 3
Jul 29 2021 11:43:47 GMT: INFO (info): (ticker.c:251)    cluster-clock: skew-ms 0
Jul 29 2021 11:43:47 GMT: INFO (info): (ticker.c:274)    system: total-cpu-pct 39 user-cpu-pct 23 kernel-cpu-pct 16 free-mem-kbytes 56387656 free-mem-pct 78
Jul 29 2021 11:43:47 GMT: INFO (info): (ticker.c:297)    process: cpu-pct 30 threads (7,61,48,48) heap-kbytes (10343098,10353396,10747904) heap-efficiency-pct 96.2
Jul 29 2021 11:43:47 GMT: INFO (info): (ticker.c:308)    in-progress: info-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0
Jul 29 2021 11:43:47 GMT: INFO (info): (ticker.c:331)    fds: proto (1726,706540,704814) heartbeat (0,0,0) fabric (48,108,60)
Jul 29 2021 11:43:47 GMT: INFO (info): (ticker.c:339)    heartbeat-received: self 123906 foreign 246976
Jul 29 2021 11:43:47 GMT: INFO (info): (ticker.c:369)    fabric-bytes-per-second: bulk (0,0) ctrl (0,0) meta (0,0) rw (198081,199796)
Jul 29 2021 11:43:47 GMT: INFO (info): (ticker.c:432) {oauth} objects: all 7072788 master 3460580 prole 3612208 non-replica 0
Jul 29 2021 11:43:47 GMT: INFO (info): (ticker.c:494) {oauth} migrations: complete
Jul 29 2021 11:43:47 GMT: INFO (info): (ticker.c:513) {oauth} memory-usage: total-bytes 8298648231 index-bytes 452658432 set-index-bytes 0 sindex-bytes 0 data-bytes 7845989799 used-pct 24.15
Jul 29 2021 11:43:47 GMT: INFO (info): (ticker.c:649) {oauth} client: tsvc (0,35) proxy (0,0,0) read (48322227,0,0,30161099,0) write (4932414,0,41,0) delete (1290225,0,1,12477,0) udf (0,0,0,0) lang (0,0,0,0)
Jul 29 2021 11:43:47 GMT: INFO (info): (hist.c:321) histogram dump: {oauth}-read (78483326 total) msec
Jul 29 2021 11:43:47 GMT: INFO (info): (hist.c:331)  (00: 0078478280) (01: 0000002965) (02: 0000000754) (03: 0000000187)
Jul 29 2021 11:43:47 GMT: INFO (info): (hist.c:331)  (04: 0000000027) (05: 0000000883) (06: 0000000206) (07: 0000000024)
Jul 29 2021 11:43:47 GMT: INFO (info): (hist.c:321) histogram dump: {oauth}-write (4932414 total) msec
Jul 29 2021 11:43:47 GMT: INFO (info): (hist.c:331)  (00: 0004517661) (01: 0000044420) (02: 0000023136) (03: 0000026870)
Jul 29 2021 11:43:47 GMT: INFO (info): (hist.c:331)  (04: 0000043743) (05: 0000050355) (06: 0000055378) (07: 0000066237)
Jul 29 2021 11:43:47 GMT: INFO (info): (hist.c:331)  (08: 0000061706) (09: 0000038408) (10: 0000004476) (11: 0000000024)
Jul 29 2021 11:43:54 GMT: INFO (nsup): (nsup.c:403) {oauth} nsup-start: expire-threads 1
Jul 29 2021 11:43:55 GMT: INFO (nsup): (nsup.c:816) {oauth} nsup-done: non-expirable 0 expired (0,0) evicted (0,0) evict-ttl 0 total-ms 730
Jul 29 2021 11:43:57 GMT: INFO (info): (ticker.c:169) NODE-ID bb922f32dd7749e CLUSTER-SIZE 3
Jul 29 2021 11:43:57 GMT: INFO (info): (ticker.c:251)    cluster-clock: skew-ms 0
Jul 29 2021 11:43:57 GMT: INFO (info): (ticker.c:274)    system: total-cpu-pct 44 user-cpu-pct 27 kernel-cpu-pct 17 free-mem-kbytes 56388244 free-mem-pct 78
Jul 29 2021 11:43:57 GMT: INFO (info): (ticker.c:297)    process: cpu-pct 38 threads (7,61,48,48) heap-kbytes (10343041,10353380,10747904) heap-efficiency-pct 96.2
Jul 29 2021 11:43:57 GMT: INFO (info): (ticker.c:308)    in-progress: info-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0
Jul 29 2021 11:43:57 GMT: INFO (info): (ticker.c:331)    fds: proto (1675,707566,705891) heartbeat (0,0,0) fabric (48,108,60)
Jul 29 2021 11:43:57 GMT: INFO (info): (ticker.c:339)    heartbeat-received: self 123972 foreign 247110
Jul 29 2021 11:43:57 GMT: INFO (info): (ticker.c:369)    fabric-bytes-per-second: bulk (0,0) ctrl (0,0) meta (0,0) rw (206919,205361)
Jul 29 2021 11:43:57 GMT: INFO (info): (ticker.c:432) {oauth} objects: all 7072769 master 3460576 prole 3612193 non-replica 0
Jul 29 2021 11:43:57 GMT: INFO (info): (ticker.c:494) {oauth} migrations: complete
Jul 29 2021 11:43:57 GMT: INFO (info): (ticker.c:513) {oauth} memory-usage: total-bytes 8298626050 index-bytes 452657216 set-index-bytes 0 sindex-bytes 0 data-bytes 7845968834 used-pct 24.15
Jul 29 2021 11:43:57 GMT: INFO (info): (ticker.c:649) {oauth} client: tsvc (0,35) proxy (0,0,0) read (48382677,0,0,30192731,0) write (4933893,0,41,0) delete (1291607,0,1,12478,0) udf (0,0,0,0) lang (0,0,0,0)
Jul 29 2021 11:43:57 GMT: INFO (info): (hist.c:321) histogram dump: {oauth}-read (78575408 total) msec
Jul 29 2021 11:43:57 GMT: INFO (info): (hist.c:331)  (00: 0078570356) (01: 0000002968) (02: 0000000755) (03: 0000000187)
Jul 29 2021 11:43:57 GMT: INFO (info): (hist.c:331)  (04: 0000000027) (05: 0000000885) (06: 0000000206) (07: 0000000024)
Jul 29 2021 11:43:57 GMT: INFO (info): (hist.c:321) histogram dump: {oauth}-write (4933893 total) msec
Jul 29 2021 11:43:57 GMT: INFO (info): (hist.c:331)  (00: 0004519001) (01: 0000044426) (02: 0000023149) (03: 0000026883)
Jul 29 2021 11:43:57 GMT: INFO (info): (hist.c:331)  (04: 0000043775) (05: 0000050371) (06: 0000055402) (07: 0000066249)
Jul 29 2021 11:43:57 GMT: INFO (info): (hist.c:331)  (08: 0000061729) (09: 0000038408) (10: 0000004476) (11: 0000000024)
Jul 29 2021 11:44:07 GMT: INFO (info): (ticker.c:169) NODE-ID bb922f32dd7749e CLUSTER-SIZE 3
Jul 29 2021 11:44:07 GMT: INFO (info): (ticker.c:251)    cluster-clock: skew-ms 0
Jul 29 2021 11:44:07 GMT: INFO (info): (ticker.c:274)    system: total-cpu-pct 41 user-cpu-pct 23 kernel-cpu-pct 18 free-mem-kbytes 56388320 free-mem-pct 78
Jul 29 2021 11:44:07 GMT: INFO (info): (ticker.c:297)    process: cpu-pct 32 threads (7,61,48,48) heap-kbytes (10342945,10353476,10747904) heap-efficiency-pct 96.2
Jul 29 2021 11:44:07 GMT: INFO (info): (ticker.c:308)    in-progress: info-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0
Jul 29 2021 11:44:07 GMT: INFO (info): (ticker.c:331)    fds: proto (1674,708568,706894) heartbeat (0,0,0) fabric (48,108,60)
Jul 29 2021 11:44:07 GMT: INFO (info): (ticker.c:339)    heartbeat-received: self 124039 foreign 247242
Jul 29 2021 11:44:07 GMT: INFO (info): (ticker.c:369)    fabric-bytes-per-second: bulk (0,0) ctrl (0,0) meta (0,0) rw (278037,273813)
Jul 29 2021 11:44:07 GMT: INFO (info): (ticker.c:432) {oauth} objects: all 7072771 master 3460575 prole 3612196 non-replica 0
Jul 29 2021 11:44:07 GMT: INFO (info): (ticker.c:494) {oauth} migrations: complete
Jul 29 2021 11:44:07 GMT: INFO (info): (ticker.c:513) {oauth} memory-usage: total-bytes 8298628608 index-bytes 452657344 set-index-bytes 0 sindex-bytes 0 data-bytes 7845971264 used-pct 24.15
Jul 29 2021 11:44:07 GMT: INFO (info): (ticker.c:649) {oauth} client: tsvc (0,35) proxy (0,0,0) read (48442851,0,0,30224331,0) write (4935928,0,41,0) delete (1292914,0,1,12478,0) udf (0,0,0,0) lang (0,0,0,0)
Jul 29 2021 11:44:07 GMT: INFO (info): (hist.c:321) histogram dump: {oauth}-read (78667182 total) msec
Jul 29 2021 11:44:07 GMT: INFO (info): (hist.c:331)  (00: 0078662120) (01: 0000002974) (02: 0000000756) (03: 0000000187)
Jul 29 2021 11:44:07 GMT: INFO (info): (hist.c:331)  (04: 0000000027) (05: 0000000887) (06: 0000000207) (07: 0000000024)
Jul 29 2021 11:44:07 GMT: INFO (info): (hist.c:321) histogram dump: {oauth}-write (4935928 total) msec
Jul 29 2021 11:44:07 GMT: INFO (info): (hist.c:331)  (00: 0004520400) (01: 0000044464) (02: 0000023170) (03: 0000026915)
Jul 29 2021 11:44:07 GMT: INFO (info): (hist.c:331)  (04: 0000043792) (05: 0000050396) (06: 0000055471) (07: 0000066353)
Jul 29 2021 11:44:07 GMT: INFO (info): (hist.c:331)  (08: 0000061930) (09: 0000038530) (10: 0000004483) (11: 0000000024)
Jul 29 2021 11:44:17 GMT: INFO (info): (ticker.c:169) NODE-ID bb922f32dd7749e CLUSTER-SIZE 3
Jul 29 2021 11:44:17 GMT: INFO (info): (ticker.c:251)    cluster-clock: skew-ms 0
Jul 29 2021 11:44:17 GMT: INFO (info): (ticker.c:274)    system: total-cpu-pct 38 user-cpu-pct 22 kernel-cpu-pct 16 free-mem-kbytes 56385640 free-mem-pct 78
Jul 29 2021 11:44:17 GMT: INFO (info): (ticker.c:297)    process: cpu-pct 32 threads (7,61,48,48) heap-kbytes (10342990,10353420,10747904) heap-efficiency-pct 96.2
Jul 29 2021 11:44:17 GMT: INFO (info): (ticker.c:308)    in-progress: info-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0
Jul 29 2021 11:44:17 GMT: INFO (info): (ticker.c:331)    fds: proto (1528,709200,707672) heartbeat (0,0,0) fabric (48,108,60)
Jul 29 2021 11:44:17 GMT: INFO (info): (ticker.c:339)    heartbeat-received: self 124106 foreign 247373
Jul 29 2021 11:44:17 GMT: INFO (info): (ticker.c:369)    fabric-bytes-per-second: bulk (0,0) ctrl (0,0) meta (0,0) rw (247553,249886)
Jul 29 2021 11:44:17 GMT: INFO (info): (ticker.c:432) {oauth} objects: all 7072783 master 3460581 prole 3612202 non-replica 0
Jul 29 2021 11:44:17 GMT: INFO (info): (ticker.c:494) {oauth} migrations: complete
Jul 29 2021 11:44:17 GMT: INFO (info): (ticker.c:513) {oauth} memory-usage: total-bytes 8298642790 index-bytes 452658112 set-index-bytes 0 sindex-bytes 0 data-bytes 7845984678 used-pct 24.15
Jul 29 2021 11:44:17 GMT: INFO (info): (ticker.c:649) {oauth} client: tsvc (0,35) proxy (0,0,0) read (48501654,0,0,30255258,0) write (4937725,0,41,0) delete (1294184,0,1,12481,0) udf (0,0,0,0) lang (0,0,0,0)
Jul 29 2021 11:44:17 GMT: INFO (info): (hist.c:321) histogram dump: {oauth}-read (78756912 total) msec
Jul 29 2021 11:44:17 GMT: INFO (info): (hist.c:331)  (00: 0078751841) (01: 0000002979) (02: 0000000757) (03: 0000000187)
Jul 29 2021 11:44:17 GMT: INFO (info): (hist.c:331)  (04: 0000000027) (05: 0000000889) (06: 0000000208) (07: 0000000024)
Jul 29 2021 11:44:17 GMT: INFO (info): (hist.c:321) histogram dump: {oauth}-write (4937725 total) msec
Jul 29 2021 11:44:17 GMT: INFO (info): (hist.c:331)  (00: 0004521740) (01: 0000044482) (02: 0000023186) (03: 0000026921)
Jul 29 2021 11:44:17 GMT: INFO (info): (hist.c:331)  (04: 0000043815) (05: 0000050425) (06: 0000055524) (07: 0000066530)
Jul 29 2021 11:44:17 GMT: INFO (info): (hist.c:331)  (08: 0000062006) (09: 0000038589) (10: 0000004483) (11: 0000000024)

Hard to read. Try to format. Also, micro benchmarks don’t seem to have been enabled. The previously provided links should hopefully be explicit enough on how to turn those on.

© 2021 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.