Batch full error


#1

Batch queue full error

Problem Description

The following warning is seen in the Aerospike server logs:

WARNING (batch): (batch.c::755) Failed to find active batch queue that is not full

Explanation

This error will be displayed when the batch-max-buffer-per-queue is exceeded for all batch-index-threads on the node.

Solution

Some parameters can be tuned to accommodate batch transactions, but those should always be changed carefully, while measuring the impact on the performance for the rest of the system.

batch-index-threads can be increased. For example:

asadm -e 'asinfo -v "set-config:context=service;batch-index-threads=8"'

In version 3.12 and above this parameter is set by default to the number of CPU cores available. For releases prior to 3.12, batch-index-threads is set to by default to 4.

There will be increase in memory usage, for example: When batch-index-threads was 4:

18055 root      20   0 3581m 122m 3912 S  2.0  0.4   0:03.79 asd                                                                                                                                           

After batch-index-threads was set to 8:

18055 root      20   0 3625m 128m 3928 S  0.0  0.4   0:06.08 asd                                                                                                                                           

batch-max-buffers-per-queue can also be increased:

asadm -e 'asinfo -v "set-config:context=service;batch-max-buffers-per-queue=1024"'

Run the following command to verify the changes:

asadm -e "show config like batch"

e.g.

~~~~~~~~~~~~~~~~~~~~~~~~~~Service Configuration~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                       :   192.168.100.192:3000   192.168.100.207:3000   
batch-index-threads        :   8                      8                     
batch-max-buffers-per-queue:   1024                   1024                   
batch-max-requests         :   5000                   5000                   
batch-max-unused-buffers   :   256                    256                    
batch-priority             :   200                    200                    
batch-threads              :   4                      4                      
query-batch-size           :   100                    100          

Finally, update the aerospike.conf for those changes to be permanent and not revert the next time the service is restarted.

Notes

  • The batch-index-unused-buffers controls when unused 128 KiB buffers will be garbage collected and by default is 256. This should be tuned to a level where normal load does not constantly trigger garbage collection.

  • For server versions 4.1.0.1 and above, a slow processing client or a long-running batch will not slow down the batch transactions that would be queued up on the batch-index thread that would be impacted. The statistic batch_index_delay would be incremented everytime such slow batch transaction is encountered, and warning message would be logged when the delay is above the allowed threshold (either twice the client total timeout or 30 seconds if the timeout is not set on the client). For older versions, the batch socket send timeout was hard-coded to 10 seconds, which meant that there could be a slow client or a huge batch bottle-necking an entire batch-index thread.

  • Configuration parameters related to batch:

http://www.aerospike.com/docs/reference/configuration/#batch-max-buffer-per-queue

http://www.aerospike.com/docs/reference/configuration/#batch-index-threads

http://www.aerospike.com/docs/reference/metrics/#batch_index_unused_buffers

  • Statistics related to batch:

http://www.aerospike.com/docs/reference/metrics/#batch_index_queue

http://www.aerospike.com/docs/reference/metrics/#batch_index_complete

Keywords

BATCH INDEX FULL

Timestamp

05/10/2018