aerospike_query_foreach() caused unusual latency and CPU

from old forum Post by Hanson » Fri Aug 08, 2014 2:43 am

Aerospike 3.3.12 with In-Memory + Disk Persistence. Running local C client with 4 threads, with/without LDT Bin (using lmap for sub-records with 1KB size of bytes).

Traffic: Write at 10K TPS

Without LDT Bin, it’s normal. With LDT Bin: Before aerospike_query_foreach() invoked, the average Write latency is 0.40ms. After aerospike_query_foreach() invoked, the average Write latency is 0.50ms, and keeps there even no more aerospike_query_foreach() invoked . It never drops back to 0.40ms. The aerospike_query_foreach() was completed very fast (in 10ms, with 1100 records retuned by callback).

Here is the piece of C client code:

    …
    as_query query;
    as_query_init(&query, db_name, table_name);
    
    as_query_where_inita(&query, 1);
    as_query_where(&query, "bucket", integer_equals(10888));
    
    as_error err;
    if (aerospike_query_foreach(connection, &err, NULL, &query, query_callback, NULL) != AEROSPIKE_OK) {
        printf("aerospike_query_foreach() failed: %d - %s\n", err.code, err.message);
    }
    …

I also observed that the CPU usage of Aerospike server process asd was unusual: “top –H” showed the CPU usage for one of the asd thread jumped from 86.6% to 96.8% (~10% higher), and keeps there.

Before aerospike_query_foreach() invoked:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
15729 root      20   0 52.6g  47g 1724 R 87.5 67.7 309:04.52 asd    
15726 root      20   0 52.6g  47g 1724 R 86.6 67.7 311:28.27 asd    
15731 root      20   0 52.6g  47g 1724 R 83.9 67.7 310:52.26 asd    
15730 root      20   0 52.6g  47g 1724 R 83.6 67.7 309:47.62 asd    
16840 ainet     20   0  357m 1988 1368 S 15.3  0.0   0:01.94 dbtest 
16841 ainet     20   0  357m 1988 1368 S 15.3  0.0   0:01.93 dbtest 
16842 ainet     20   0  357m 1988 1368 S 15.3  0.0   0:01.94 dbtest 
16843 ainet     20   0  357m 1988 1368 S 15.3  0.0   0:01.94 dbtest 
15604 root      20   0 52.6g  47g 1724 S  5.3 67.7  24:29.59 asd    
15601 root      20   0 52.6g  47g 1724 S  2.0 67.7  10:32.64 asd    
15725 root      20   0 52.6g  47g 1724 S  0.7 67.7  19:01.29 asd   

After aerospike_query_foreach() invoked:

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
15726 root      20   0 52.0g  47g 1724 R 96.8 66.8 310:40.19 asd    
15729 root      20   0 52.0g  47g 1724 R 62.3 66.8 308:29.90 asd    
15731 root      20   0 52.0g  47g 1724 R 62.0 66.8 310:18.01 asd    
15730 root      20   0 52.0g  47g 1724 S 61.0 66.8 309:13.39 asd    
16805 ainet     20   0  791m  14m 1488 R 14.6  0.0   0:10.81 dbtest 
16806 ainet     20   0  791m  14m 1488 S 14.6  0.0   0:10.83 dbtest 
16808 ainet     20   0  791m  14m 1488 S 14.6  0.0   0:10.82 dbtest 
16807 ainet     20   0  791m  14m 1488 S 14.3  0.0   0:10.81 dbtest 
15604 root      20   0 52.0g  47g 1724 S  4.3 66.8  24:27.68 asd    
15601 root      20   0 52.0g  47g 1724 S  1.7 66.8  10:31.79 asd    
15661 root      20   0 52.0g  47g 1724 S  1.7 66.8   0:09.80 asd

Post by devops01 » Fri Aug 08, 2014 1:43 pm

Would you be able to help clarify the following:

(1) Clarification of “With LDT Bin” Is the database preloaded with records, all of which have 1 LDT bin?

(2) Clarification of the “Write at 10K TPS” What type of write? Does it affect the LDT bins? Does it affect the bin “bucket”?

(3) Performance Clarification: Can you measure the query and write performance from the server side (log latency on query) Can you measure write performance as well, in either of the cases?


Post by Hanson » Sun Aug 10, 2014 2:23 am

Lucien, (1) Clarification of “With LDT Bin” Is the database preloaded with records, all of which have 1 LDT bin?

I have tested with cases: a) Begin with empty table b) Begin with table with 35GB (30M records) preloaded. Both have the same results.

I have also tested with cases: a) Record with 1 LDT Bin only b) Record with 1 LDT Bin + 1 int64 Bin Both have the same results.

(2) Clarification of the “Write at 10K TPS” What type of write? Does it affect the LDT bins? Does it affect the bin “bucket”?

The Write was done as 2 cases: a) Via aerospike_lmap_put() for sub-record + aerospike_key_put() only (1 LDT Bin only) b) Via aerospike_lmap_put() for sub-record + aerospike_key_put() for int64 Bin “bucket” in root-record (1 LDT Bin + 1 int64 Bin) Both have the same results.

(3) Performance Clarification: Can you measure the query and write performance from the server side (log latency on query) Can you measure write performance as well, in either of the cases?

Since the average latency is very small (0.4~0.5ms), asmonitor showed most as “0.00”:

Monitor> latency
        ====writes_master====
                                      timespan   ops/sec   >1ms   >8ms   >64ms
10.252.167.143:3000    08:50:41-GMT->08:50:51   10379.1   0.00   0.00    0.00
 
        ====writes_reply====
                                      timespan   ops/sec   >1ms   >8ms   >64ms
10.252.167.143:3000    08:50:41-GMT->08:50:51   10379.1   0.00   0.00    0.00
 
        ====reads====
                                      timespan   ops/sec   >1ms   >8ms   >64ms
10.252.167.143:3000    08:50:41-GMT->08:50:51       0.0   0.00   0.00    0.00
 
        ====udf====
                                      timespan   ops/sec   >1ms   >8ms   >64ms
10.252.167.143:3000    08:50:41-GMT->08:50:51   10379.2   0.00   0.00    0.00
 
        ====proxy====
                                      timespan   ops/sec   >1ms   >8ms   >64ms
10.252.167.143:3000    08:50:41-GMT->08:50:51       0.0   0.00   0.00    0.00
 
        ====query====
                                      timespan   ops/sec   >1ms   >8ms   >64ms
10.252.167.143:3000    08:50:41-GMT->08:50:51        0.1 100.00   0.00    0.00


Postby Hanson » Sun Aug 10, 2014 2:26 am

One correction: “a) Via aerospike_lmap_put() for sub-record + aerospike_key_put() only (1 LDT Bin only)” Shall be: “a) Via aerospike_lmap_put() for sub-record only (1 LDT Bin only)”


Post by Hanson » Mon Aug 11, 2014 3:40 am

After changed /etc/aerospike/aerospike.conf with parameter: service-threads 4 to: service-threads 5

The latency and CPU usage become normal.

My C client has 4 threads, seems there is relationship: service-threads >= [#threads in Client] + 1

I noticed that the number of sockets to port 3000 on Client side increased 1 after aerospike_query_foreach() invoked. And each thread has its own socket. Seems each socket on server side needs dedicate service thread to handle. If no enough service-threads, the latency and CPU usage will become unusual)

Before aerospike_query_foreach() invoked, there are 5 sockets to port 3000:

[root@localhost aerospike]# netstat -n|grep 3000
tcp        0    113 127.0.0.1:34402             127.0.0.1:3000              ESTABLISHED 
tcp        0      0 127.0.0.1:34398             127.0.0.1:3000              TIME_WAIT   
tcp        0      0 127.0.0.1:3000              127.0.0.1:34404             ESTABLISHED 
tcp        0      0 127.0.0.1:3000              127.0.0.1:34405             ESTABLISHED 
tcp        0      0 127.0.0.1:3000              127.0.0.1:34400             ESTABLISHED 
tcp        0      0 127.0.0.1:34399             127.0.0.1:3000              TIME_WAIT   
tcp        0      0 127.0.0.1:34400             127.0.0.1:3000              ESTABLISHED 
tcp        0   1117 127.0.0.1:34405             127.0.0.1:3000              ESTABLISHED 
tcp        0   1117 127.0.0.1:34403             127.0.0.1:3000              ESTABLISHED 
tcp        0      0 127.0.0.1:3000              127.0.0.1:34403             ESTABLISHED 
tcp        0      0 127.0.0.1:3000              127.0.0.1:34402             ESTABLISHED 
tcp        0      0 127.0.0.1:34401             127.0.0.1:3000              TIME_WAIT   
tcp        0   1117 127.0.0.1:34404             127.0.0.1:3000              ESTABLISHED

After aerospike_query_foreach() invoked, there are 6 sockets to port 3000:

[root@localhost aerospike]# netstat -n|grep 3000
tcp        0      0 127.0.0.1:34402             127.0.0.1:3000              ESTABLISHED 
tcp        0      0 127.0.0.1:34398             127.0.0.1:3000              TIME_WAIT   
tcp        0      0 127.0.0.1:3000              127.0.0.1:34404             ESTABLISHED 
tcp        0      0 127.0.0.1:3000              127.0.0.1:34406             ESTABLISHED 
tcp        0      0 127.0.0.1:3000              127.0.0.1:34405             ESTABLISHED 
tcp        0   1117 127.0.0.1:34406             127.0.0.1:3000              ESTABLISHED 
tcp        0      0 127.0.0.1:3000              127.0.0.1:34400             ESTABLISHED 
tcp        0      0 127.0.0.1:34399             127.0.0.1:3000              TIME_WAIT   
tcp        0      0 127.0.0.1:34400             127.0.0.1:3000              ESTABLISHED 
tcp        0   1117 127.0.0.1:34405             127.0.0.1:3000              ESTABLISHED 
tcp        0   1117 127.0.0.1:34403             127.0.0.1:3000              ESTABLISHED 
tcp        0      0 127.0.0.1:3000              127.0.0.1:34403             ESTABLISHED 
tcp        0     30 127.0.0.1:3000              127.0.0.1:34402             ESTABLISHED 
tcp        0      0 127.0.0.1:34401             127.0.0.1:3000              TIME_WAIT   
tcp        0   1117 127.0.0.1:34404             127.0.0.1:3000              ESTABLISHED

Postby devops01 » Wed Aug 13, 2014 9:04 am

“For data-in-memory configuration, the best service-thread tuning should be the number of non-hyperthreaded cores on the machine that Aerospike is running. These threads will handle primary-key operations”

“However, it seems you are making secondary index queries. In this case, a different set of thread tuning is available”.

(Secondary Index thread tuning, types of query etc)

“We will be coming up with a performance tuning guide in the near future”.

Some useful info at : http://www.aerospike.com/docs/operations/manage/queries/