Java client configuration for high performance

This my java client configuration

maxCommandsInProcess = 70
maxCommandsInQueue = 5000
maxConnsPerNode = 300 (default)
NioEventLoops with size 4

Number of nodes is 1. With this configuration I am making 20 concurrent requests with each request doing 50,000 updates (Write policy REPLACE). Throttling on the code is done the following way

for each request (with 50,000 update commands)
 batch the update commands in batches of 900
  process these 900 update commands asynchronously(using async methods provided by the java client)
  wait for all the 900 to complete

Note that the above logic is applied concurrently to 20 such requests.

This process takes about ~ 55 seconds for each request (50,000 updates) to complete. Is there anything I can do better (in terms of client library configuration, concurrency etc) with everything else remaining the same.

Yes, apply the same async methodology used in the benchmarks program.

  1. Create new variable asyncMaxCommands that is the total commands that you want to run concurrently on all event loops. Set asyncMaxCommands = 100.

  2. Create asyncMaxCommands seed command instances and distribute them across the event loops (100 / 4 = 25 commands on each event loop).

  3. In each command callback, run exactly one new command (instance can be reused) for each command that completed. This guarantees a stable number of concurrent commands on each event loop.

  4. Stop running new commands when all 50000 update commands have been submitted.

Further details are available in the benchmarks source code.

I tried it out the way you described. Took a look at the way it is done here. 1 million (20 concurrent requests with each request containing 50000 records = 1 million updates) updates take 103 seconds, So, it’s probably 5 seconds per 50,000

BUT, the problem with the approach is that it is not able to handle concurrency. This approach picks each command from a queue and completes it. So processing of the 20th batch will start only after 95 seconds. This is not desirable in my use case as all these 20 requests (each with 50000 update commands) will come at the same time and the handling of these requests has to be concurrent.

It’s not necessary to use multiple batches. Just use a single batch of 1 million with the same algorithm and adjust event loop count and asyncMaxCommands to match hardware capabilities for max throughput.

The alternative approach is to use the client’s internal async throttle configuration (EventPolicy) to manage command flow. Define event loop count and then maxCommandsInProcess and maxCommandsInQueue for each event loop. Determining these values will involve trial and error via iterative benchmarks. A reasonable starting point is:

event loop count = number of cpu cores on your machine

maxCommandsInProcess = 50

maxCommandsInQueue = 10000

When the async queue count reaches maxCommandsInQueue, successive commands will be rejected with AsyncQueueFull exception (ResultCode.ASYNC_QUEUE_FULL). If the exception is received, your application should have the capability to backoff issuing new commands until the async queue reduces in size. The async queue count can be monitored by eventLoop.getQueueSize().

The backoff functionality can be tricky to implement because it must be done in a non-eventloop thread and further commands must either be delayed or rejected by your application.

A simpler (but not recommended) solution is to set maxCommandsInQueue = 0 (no maximum), but you run the risk of running out of memory in extreme circumstances.

It’s not necessary to use multiple batches. Just use a single batch of 1 million with the same algorithm and adjust event loop count and asyncMaxCommands to match hardware capabilities for max throughput.

But my use case requires me to use it as I will get the files in batches. You can think of it as 20 HTTP requests with each requests sending me a file (containing 50,000 update commands) . At any point of time I could get 0 to 20 requests and I cannot wait for 20 requests before starting the processing.

Also is 103 seconds for 1 million updates (serial, no batching) a decent number?

9708 tps seems low, but it depends on the average data size in each transaction. I get more than double that rate on my laptop for transactions < 100 bytes. Large machines can achieve > 1 million tps.

All systems have a maximum throughput rate derived from various attributes (cpu type, cpu count, network bandwidth, memory size, storage type, client/server configuration, …). If the rate of new transactions consistently exceeds the maximum throughput rate, then you will need to add a throttle to limit transaction flow.

I suggest opening a case with Enterprise support on performance tuning to improve your throughput. The more throughput your system supports, the more transactions can be run in parallel.

The data size is less than 20 bytes. I think I am doing something fundamentally wrong.

© 2015 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.