Batch Writes Thread Executor Overhead: Worth It for Large or Small Batch Sizes?

I’m analyzing the client.operate(BatchPolicy policy, List<BatchRecord> records) method in Aerospike and found that it calls BatchExecutor.execute(cluster, policy, commands, status), where a new ExecutorService is created every time. This raises concerns about performance overhead, regardless of whether virtual threads or platform threads are used.

For reference: aerospike-client-java/client/src/com/aerospike/client/command/BatchExecutor.java at master · aerospike/aerospike-client-java · GitHub

Key Questions:

  1. Is it intentional for a new executor to be created on every .operate call?
  2. Is this batch write approach efficient only for large batch sizes (e.g., 1000 records), or does it also make sense for smaller batches?
  3. Could this be considered an inefficiency in the current implementation?

Additional Context:

  • Batch Size Consideration: The maximum batch size I’m planning is 25, but it may often be less than 25 due to my application’s nature.
  • Batch Execution in a Loop: The batch execute command appears to be inside a loop, meaning batch writes are likely executed per node based on data distribution. Can someone clarify how this works internally?

Concern: Executor Creation & Shutdown Overhead

  • Each batch request creates a new ExecutorService using Executors.newThreadPerTaskExecutor().
  • This results in new threads being spawned for every batch operation, whether they are virtual or platform threads.
  • The executor is shut down after each batch, leading to unnecessary thread lifecycle overhead.
  • Even though virtual threads are lightweight, constant creation and destruction of executors introduces extra processing overhead.

Would love to hear insights on whether this behavior is intended, if there are optimizations to reduce thread creation overhead, and whether batch writes are even worth using for smaller batch sizes in Aerospike.

The jdk21 java client (master branch) uses the ExecutorService interface, but effectively instantiates a new virtual thread for every batch node command. Virtual threads are lightweight and are designed for this usage pattern.

  1. Yes.
  2. This virtual thread implementation works efficiently for all batch sizes.
  3. No.

The jdk8 java client (jdk8 branch) uses a shared thread pool of platform threads and does not instantiate an ExecutorService on every command.

The batch command keys are assigned to different cluster nodes based on the key hash and the partition map. Then, a batch node command is issued for every node that has assigned keys. The jdk21 java client always issues sync batch node commands in parallel due to lightweight virtual threads. The jdk8 java client issues sync batch node commands in parallel or sequence depending on BatchPolicy. Async batch node commands are always issued in parallel in both jdk21 and jdk8 clients.

The java client automatically converts a batch node command to a single record command when that node command only contains 1 key. This improves performance for small batch sizes.

Got it. Currently, my application runs only on platform threads, but I’m considering using a JDK 21 client that utilises virtual threads. The reason for upgrading to JDK 21 is that I’m planning to move my application to virtual threads. While some dependencies don’t fully support virtual threads yet, my plan is to transition quickly with minimal effort once they do.

Do you foresee any issues with this, given that Aerospike commands would run on virtual threads while the application itself remains on platform threads? Just want to know if virtual threads could cause any problems here.

Your scenario should be fine. The jdk21 java client only creates virtual threads for commands that use sub-threads (batch/scan/query). These sub-threads are independent of other threads used in the application. All other commands use threads assigned to them by the caller (in your case platform threads).

if the batch operation has multiple keys,

  1. failure on one key will stop the other keys execution?
  2. other keys data will be persisted in the server irrespective of which key failed?
  1. Determined by BatchPolicy.respondAllKeys (default is true).
  2. Yes, when transactions are not used. If batch runs under a transaction, the transaction can be aborted which rolls back all keys.