FAQ - How to tune batch parameters
This article describes how to tune batch parameters to avoid CPU spikes.
For basic understanding of batch, refer to the batch guide.
Here are the 4 main parameters for tuning batch index transactions on the server side:
The default values should be adequate for common workloads, but it may be necessary to further tune those parameter for specific more demanding workloads in order to avoid situations leading to all batch index queues being full or intermittent CPU spikes. It is also important to understand the memory impact of those parameters.
For example, with 4 index-threads, 512 buffers-per-queue, the total memory buffers used by batch will be 4 * 512 * 128 KiB = 256 MiB. With 8 index-threads, 1024 buffers-per-queue, the total memory used by batch will be 8 *1024 *128 KiB = 1 GiB.
To monitor the statistics, you can run the following command in asadm:
Admin> show statistics like batch_index ~~~~~~~~~~~~~~~~~~~Service Statistics~~~~~~~~~~~~~~~~~~ NODE : host.aerospike.com:3000 batch_index_complete : 3534847 batch_index_created_buffers : 62 batch_index_destroyed_buffers: 0 batch_index_error : 46 batch_index_huge_buffers : 0 batch_index_initiate : 3534911 batch_index_queue : 5:5,4:4,4:4,5:5 batch_index_timeout : 0 batch_index_unused_buffers : 44
In the above example, 62 buffers are created (
batch_index_created_buffers) and none of them get destroyed (
batch_index_destroyed_buffers). There are 44 unused buffers (
batch_index_unused_buffers) but from the batch_index_queues, there are 18 buffers (
batch_index_queue) being used by the 4 index threads.
~~~~~~~~~~~~~~~~~~~Service Statistics~~~~~~~~~~~~~~~~~~ NODE : host.aerospike.com:3000 batch_index_complete : 3520267 batch_index_created_buffers : 62 batch_index_destroyed_buffers: 0 batch_index_error : 44 batch_index_huge_buffers : 0 batch_index_initiate : 3520335 batch_index_queue : 12:16,6:5,0:0,6:5 batch_index_timeout : 0 batch_index_unused_buffers : 33
In this example, there are 33 unused buffers while in the batch_index_queues, there are 16+5+5 = 26 buffers being used.
To determine the current settings:
Admin> show config like batch ~~~~~~~~~~~~~~~~Service Configuration~~~~~~~~~~~~~~~~ NODE : host.aerospike.com:3000 batch-index-threads : 4 batch-max-buffers-per-queue: 255 batch-max-requests : 5000 batch-max-unused-buffers : 256 batch-priority : 200 batch-threads : 4 query-batch-size : 100
Here is another example:
batch_index_complete : 964670 batch_index_created_buffers : 165620 batch_index_destroyed_buffers: 165324 batch_index_error : 17168 batch_index_huge_buffers : 154797 batch_index_initiate : 1020295 batch_index_queue : 9:23,4:33,11:33,7:6 batch_index_timeout : 38426 batch_index_unused_buffers : 191
In this case, there are 191 unused buffers and the currently active buffers are (165620 - 165324) = 296. The default batch-max-unused-buffers is 256 so there are buffers being created that will be destroyed. There are also buffers larger than 128 KiB being created to accommodate records larger than 128 KiB. Those are tracked by the
batch_index_huge_buffers statistic. Those huge buffers are destroyed after completed and will not be returned to the buffer pool. The buffer pool can contain a maximum of 256 128 KiB buffers by default. They get used when available and returned to the pool when completed, if the pool size is still below the maximum configured value. If the
batch_index_destroyed_buffers statistics are incrementing fast, it is an indication of high garbage collection activity. This will potentially lead to CPU spikes. Tuning the
batch-max-unused-buffers to a higher value may help.
batch_index_huge_buffers is also large, then it may be advisable to avoid using batch. Large records already saturate socket data buffers in single record mode, so batching them together does not provide any benefit and results in higher memory usage on the server.
To experiment the effect of huge buffer on batch index protocol, the following Java benchmark workload can be run:
./run_benchmarks -h 127.0.0.1 -p 3000 -n test -k 10000000 -b 1 -o B:180000 -w RU,99 -g 5000 -T 50 -z 80 -B 10 -t 500000;
After running a few times, the statistics would like this:
batch_index_complete : 258277 batch_index_created_buffers : 11906 batch_index_destroyed_buffers: 11823 batch_index_error : 0 batch_index_huge_buffers : 11823 batch_index_initiate : 258277 batch_index_queue : 0:0,0:0,0:0,0:0 batch_index_timeout : 0 batch_index_unused_buffers : 83
This article applies to the new batch index protocol. The old batch direct protocol has different configuration options.
For server versions 126.96.36.199 and above, a slow processing client or a long-running batch will not slow down the batch transactions that would be queued up on the batch-index thread that would be impacted. The statistic
batch_index_delaywould be incremented everytime such slow batch transaction is encountered, and warning message would be logged when the delay is above the allowed threshold (either twice the client total timeout or 30 seconds if the timeout is not set on the client). For older versions, the batch socket send timeout was hard-coded to 10 seconds, which meant that there could be a slow client or a huge batch bottle-necking an entire
For larger batch transactions, the
batch-max-requestsconfiguration parameter can be increased. Error code 151 will be received on the client when a batch transaction on a node is greater than the configured value. On the server side, the following WARNING line will be logged:
WARNING (batch): (batch.c:832) Batch request size 6000 exceeds max 5000
Allowing more reads per batch request may impact overall system performance. Monitor and adjust gradually.
Main configuration parameters related to batch:
Main statistics related to batch:
BATCH TUNING BUFFERS CPU SPIKES