Batch queue full error
The following warning is seen in the Aerospike server logs:
WARNING (batch): (batch.c::755) Failed to find active batch queue that is not full
This error will be displayed when the
batch-max-buffer-per-queue is exceeded for all
batch-index-threads on the node.
Some parameters can be tuned to accommodate batch transactions, but those should always be changed carefully, while measuring the impact on the performance for the rest of the system.
batch-index-threads can be increased. For example:
asadm -e 'asinfo -v "set-config:context=service;batch-index-threads=8"'
In version 3.12 and above this parameter is set by default to the number of CPU cores available. For releases prior to 3.12,
batch-index-threads is set to by default to 4.
There will be increase in memory usage, for example: When batch-index-threads was 4:
18055 root 20 0 3581m 122m 3912 S 2.0 0.4 0:03.79 asd
After batch-index-threads was set to 8:
18055 root 20 0 3625m 128m 3928 S 0.0 0.4 0:06.08 asd
batch-max-buffers-per-queue can also be increased:
asadm -e 'asinfo -v "set-config:context=service;batch-max-buffers-per-queue=1024"'
Run the following command to verify the changes:
asadm -e "show config like batch"
~~~~~~~~~~~~~~~~~~~~~~~~~~Service Configuration~~~~~~~~~~~~~~~~~~~~~~~~~~~ NODE : 192.168.100.192:3000 192.168.100.207:3000 batch-index-threads : 8 8 batch-max-buffers-per-queue: 1024 1024 batch-max-requests : 5000 5000 batch-max-unused-buffers : 256 256 batch-priority : 200 200 batch-threads : 4 4 query-batch-size : 100 100
Finally, update the aerospike.conf for those changes to be permanent and not revert the next time the service is restarted.
The batch-index-unused-buffers controls when unused 128 KiB buffers will be garbage collected and by default is 256. This should be tuned to a level where normal load does not constantly trigger garbage collection.
For server versions 220.127.116.11 and above, a slow processing client or a long-running batch will not slow down the batch transactions that would be queued up on the batch-index thread that would be impacted. The statistic
batch_index_delaywould be incremented everytime such slow batch transaction is encountered, and warning message would be logged when the delay is above the allowed threshold (either twice the client total timeout or 30 seconds if the timeout is not set on the client). For older versions, the batch socket send timeout was hard-coded to 10 seconds, which meant that there could be a slow client or a huge batch bottle-necking an entire
Configuration parameters related to batch:
- Statistics related to batch:
BATCH INDEX FULL