How to reduce latency on production cluster


#1

Hi,

Here is an example of latency on our production cluster.

Admin> show latency
~~~~~~~~~~~~~~~~~~~~~~~~~~~~proxy Latency~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Node                     Time   Ops/Sec   >1Ms   >8Ms   >64Ms   
           .                     Span         .      .      .       .   
192.168.0.1    07:19:21-GMT->07:19:31       0.0    0.0    0.0     0.0   
192.168.0.21   07:19:18-GMT->07:19:28       0.0    0.0    0.0     0.0   
192.168.0.24   07:19:26-GMT->07:19:36       0.0    0.0    0.0     0.0   
192.168.0.42   07:19:24-GMT->07:19:34       0.0    0.0    0.0     0.0   
192.168.0.48   07:19:20-GMT->07:19:30       0.0    0.0    0.0     0.0   
Number of rows: 5

~~~~~~~~~~~~~~~~~~~~~~~~~~~~query Latency~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Node                     Time   Ops/Sec   >1Ms   >8Ms   >64Ms 
           .                     Span         .      .      .       . 
192.168.0.1    07:19:21-GMT->07:19:31       0.0    0.0    0.0     0.0 
192.168.0.21   07:19:18-GMT->07:19:28       0.0    0.0    0.0     0.0 
192.168.0.24   07:19:26-GMT->07:19:36       0.0    0.0    0.0     0.0 
192.168.0.42   07:19:24-GMT->07:19:34       0.0    0.0    0.0     0.0 
192.168.0.48   07:19:20-GMT->07:19:30       0.0    0.0    0.0     0.0 
Number of rows: 5



~~~~~~~~~~~~~~~~~~~~~~~~~~~~reads Latency~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Node                     Time   Ops/Sec   >1Ms   >8Ms   >64Ms 
           .                     Span         .      .      .       . 
192.168.0.1    07:19:21-GMT->07:19:31   71193.2   0.37   0.01     0.0 
192.168.0.21   07:19:18-GMT->07:19:28   51730.4   0.08    0.0     0.0 
192.168.0.24   07:19:26-GMT->07:19:36   71845.3   0.14    0.0     0.0 
192.168.0.42   07:19:24-GMT->07:19:34   55501.6   0.15    0.0     0.0 
192.168.0.48   07:19:20-GMT->07:19:30   71567.3   0.16    0.0     0.0 
Number of rows: 5



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~udf Latency~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Node                     Time   Ops/Sec   >1Ms   >8Ms   >64Ms 
           .                     Span         .      .      .       . 
192.168.0.1    07:19:21-GMT->07:19:31     932.5    0.0    0.0     0.0 
192.168.0.21   07:19:18-GMT->07:19:28     924.2    0.0    0.0     0.0 
192.168.0.24   07:19:26-GMT->07:19:36     973.1   0.01    0.0     0.0 
192.168.0.42   07:19:24-GMT->07:19:34     956.7    0.0    0.0     0.0 
192.168.0.48   07:19:20-GMT->07:19:30     955.8    0.0    0.0     0.0 
Number of rows: 5



~~~~~~~~~~~~~~~~~~~~~~~~~writes_master Latency~~~~~~~~~~~~~~~~~~~~~~~~~
        Node                     Time   Ops/Sec    >1Ms    >8Ms   >64Ms 
           .                     Span         .       .       .       . 
192.168.0.1    07:19:21-GMT->07:19:31      19.8   100.0   100.0   100.0 
192.168.0.21   07:19:18-GMT->07:19:28      41.2   100.0   100.0   100.0 
192.168.0.24   07:19:26-GMT->07:19:36       5.4   100.0   100.0   100.0 
192.168.0.42   07:19:24-GMT->07:19:34       0.0     0.0     0.0     0.0 
192.168.0.48   07:19:20-GMT->07:19:30      60.9   100.0   100.0   100.0 
Number of rows: 5



~~~~~~~~~~~~~~~~~~~~~~~~~writes_reply Latency~~~~~~~~~~~~~~~~~~~~~~~~
        Node                     Time   Ops/Sec   >1Ms   >8Ms   >64Ms 
           .                     Span         .      .      .       . 
192.168.0.1    07:19:21-GMT->07:19:31      68.9   3.05    2.9    2.32 
192.168.0.21   07:19:18-GMT->07:19:28      74.5   0.94   0.81    0.81 
192.168.0.24   07:19:26-GMT->07:19:36      79.1   1.14   0.63    0.63 
192.168.0.42   07:19:24-GMT->07:19:34      67.9   0.15   0.15    0.15 
192.168.0.48   07:19:20-GMT->07:19:30      71.7   1.12   0.98    0.98

The problematics ones is UDF Latency.

When you see our latency stats, do you think:

  • we have to increase or decrease the number of nodes
  • we have a config problem
  • we have to try a custom build to change LUA GC ?
  • you need more information ?

Thx you for any hints.

Bertrand


#2

Finally I found the root cause : too many load on the server outside aerospike. I have now three dedicated instead of five shared, and no latency problem.

Bertrand


#3