FAQ - Differences between getting single record versus a batch
What are the performance considerations/differences between single record transactions versus batch transactions?
The batch transactions use the batch index protocol by default (as of server version 3.6).
1 A batch request is issued from the client. 2 The Aerospike client library splits the request across the different nodes (based on partition map). 3 Within each node the batch transaction is split into single record requests that are distributed across the transaction threads. 4 Consolidate individual responses into 128KB batch index response buffers.(See note 6 below) 5 Place response buffers on the batch response thread queue so that the transaction thread does not block. 6 Batch response threads return these response buffers to the client.
The difference between single record transaction versus batch is found in steps 4 to steps 6 where the results of each individual operation gets placed by the batch index worker thread into a batch index response buffer. Batch requests can increase the latency of some requests, as clients normally wait until all keys are retrieved from the server nodes before it returns control to the caller. Memory overhead also occurs as there is a maximum number of keys allowed per node, and unexpected large batch requests can cause excessive memory consumption.
There are non-trivial methods to tune the associated config parameters (
Aerospike always recommends benchmarking different configurations, ideally modifying one configuration parameter at a time (such as the
batch-max-buffer-per-queue configuration parameter setting can help avoid having batch index queues that are full, as full batch index queues would reject new batch requests.
batch-max-requests configuration parameter, sets the maximum number of keys allowed per node.
batch-max-requests is used to prevent unexpected large batch requests from causing excessive memory consumption.
Batch size impacts operations speed and efficiency.
Batch requests use a single network transaction to each server. It’s likely that separate single record transactions would perform faster and more efficiently if the batch size is very small as small bach size transactions wuold result in single to very low digit transactions on each node of the cluster (this of course depends on the batch size to cluster size ratio).
It can be beneficial to use batch transactions when the batch size to cluster size ratio is greater than 1 and typically above 3 to 5 so that each node would get to process on average a handful or more single record transactions per batch.
Batch transactions are also typically advantageous when the size of the records being fetched is small, reducing the per record overhead over the network. As records get larger, the impact of the overhead is minimized making the use of batch transactions potentially less attractive.
In all cases it is recommended to test the use case on a development or performance environment to confirm what is the best use for a given use case / workload.
Legacy batch direct protocol
Batch direct does not support proxying transactions during migrations. The batch index protocol uses the same transaction path as single record reads and will therefore proxy during cluster migration.
Batch direct requests run at a lower priority than single record transactions, batch index requests run at the same priority as single record transactions.
If records are >128KB size, read about batch_index_huge_buffers on this knowledge-base article.
SINGLE-BIN BATCH INDEX DIRECT RECORD THREADS BUFFERS RESPONSE QUEUES