FAQ - batch-index tuning parameters


#1

FAQ - How to tune batch parameters

Context

This article describes how to tune batch parameters to avoid CPU spikes.

For basic understanding of batch, refer to the batch guide.

Method

Here are the 4 main parameters for tuning batch index transactions on the server side:

The default values should be adequate for common workloads, but it may be necessary to further tune those parameter for specific more demanding workloads in order to avoid situations leading to all batch index queues being full or intermittent CPU spikes. It is also important to understand the memory impact of those parameters.

For example, with 4 index-threads, 512 buffers-per-queue, the total memory buffers used by batch will be 4 * 512 * 128 KiB = 256 MiB. With 8 index-threads, 1024 buffers-per-queue, the total memory used by batch will be 8 *1024 *128 KiB = 1 GiB.

To monitor the statistics, you can run the following command in asadm:

Admin> show statistics like batch_index
~~~~~~~~~~~~~~~~~~~Service Statistics~~~~~~~~~~~~~~~~~~
NODE                         :   host.aerospike.com:3000
batch_index_complete         :   3534847
batch_index_created_buffers  :   62
batch_index_destroyed_buffers:   0
batch_index_error            :   46
batch_index_huge_buffers     :   0
batch_index_initiate         :   3534911
batch_index_queue            :   5:5,4:4,4:4,5:5
batch_index_timeout          :   0
batch_index_unused_buffers   :   44

In the above example, 62 buffers are created (batch_index_created_buffers) and none of them get destroyed (batch_index_destroyed_buffers). There are 44 unused buffers (batch_index_unused_buffers) but from the batch_index_queues, there are 18 buffers (batch_index_queue) being used by the 4 index threads.

~~~~~~~~~~~~~~~~~~~Service Statistics~~~~~~~~~~~~~~~~~~
NODE                         :   host.aerospike.com:3000
batch_index_complete         :   3520267
batch_index_created_buffers  :   62
batch_index_destroyed_buffers:   0
batch_index_error            :   44
batch_index_huge_buffers     :   0
batch_index_initiate         :   3520335
batch_index_queue            :   12:16,6:5,0:0,6:5
batch_index_timeout          :   0
batch_index_unused_buffers   :   33

In this example, there are 33 unused buffers while in the batch_index_queues, there are 16+5+5 = 26 buffers being used.

To determine the current settings:

Admin> show config like batch
~~~~~~~~~~~~~~~~Service Configuration~~~~~~~~~~~~~~~~
NODE                       :   host.aerospike.com:3000   
batch-index-threads        :   4                        
batch-max-buffers-per-queue:   255                      
batch-max-requests         :   5000                     
batch-max-unused-buffers   :   256                      
batch-priority             :   200                      
batch-threads              :   4                        
query-batch-size           :   100

Here is another example:

batch_index_complete         :   964670
batch_index_created_buffers  :   165620
batch_index_destroyed_buffers:   165324
batch_index_error            :   17168
batch_index_huge_buffers     :   154797
batch_index_initiate         :   1020295
batch_index_queue            :   9:23,4:33,11:33,7:6
batch_index_timeout          :   38426
batch_index_unused_buffers   :   191

In this case, there are 191 unused buffers and the currently active buffers are (165620 - 165324) = 296. The default batch-max-unused-buffers is 256 so there are buffers being created that will be destroyed. There are also buffers larger than 128 KiB being created to accommodate records larger than 128 KiB. Those are tracked by the batch_index_huge_buffers statistic. Those huge buffers are destroyed after completed and will not be returned to the buffer pool. The buffer pool can contain a maximum of 256 128 KiB buffers by default. They get used when available and returned to the pool when completed, if the pool size is still below the maximum configured value. If the batch_index_created_buffers and batch_index_destroyed_buffers statistics are incrementing fast, it is an indication of high garbage collection activity. This will potentially lead to CPU spikes. Tuning the batch-max-unused-buffers to a higher value may help.

If batch_index_huge_buffers is also large, then it may be advisable to avoid using batch. Large records already saturate socket data buffers in single record mode, so batching them together does not provide any benefit and results in higher memory usage on the server.

To experiment the effect of huge buffer on batch index protocol, the following Java benchmark workload can be run:

 ./run_benchmarks -h 127.0.0.1 -p 3000 -n test -k 10000000 -b 1 -o B:180000 -w RU,99 -g 5000  -T 50 -z 80 -B 10 -t 500000;

After running a few times, the statistics would like this:

batch_index_complete         :   258277
batch_index_created_buffers  :   11906
batch_index_destroyed_buffers:   11823
batch_index_error            :   0
batch_index_huge_buffers     :   11823
batch_index_initiate         :   258277
batch_index_queue            :   0:0,0:0,0:0,0:0
batch_index_timeout          :   0
batch_index_unused_buffers   :   83

In this case, batch_index_huge_buffers and batch_index_destroyed_buffers are the same. This is indication of huge buffer being created and destroyed. Huge buffers do not go back to buffer pool.

Notes

  • This article applies to the new batch index protocol. The old batch direct protocol has different configuration options.

  • For server versions 4.1.0.1 and above, a slow processing client or a long-running batch will not slow down the batch transactions that would be queued up on the batch-index thread that would be impacted. The statistic batch_index_delay would be incremented everytime such slow batch transaction is encountered, and warning message would be logged when the delay is above the allowed threshold (either twice the client total timeout or 30 seconds if the timeout is not set on the client). For older versions, the batch socket send timeout was hard-coded to 10 seconds, which meant that there could be a slow client or a huge batch bottle-necking an entire batch-index thread.

  • For larger batch transactions, the batch-max-requests configuration parameter can be increased. Error code 151 will be received on the client when a batch transaction on a node is greater than the configured value. On the server side, the following WARNING line will be logged:

WARNING (batch): (batch.c:832) Batch request size 6000 exceeds max 5000

Allowing more reads per batch request may impact overall system performance. Monitor and adjust gradually.

References

Main configuration parameters related to batch:

Main statistics related to batch:

Keywords

BATCH TUNING BUFFERS CPU SPIKES

Timestamp

08/16/2018


Batch Operations
FAQ - Differences between getting single record versus batch