Aerospike Java client shows high read times when EC2 instance is created

shan.p · November 16, 2021, 3:28am

Hello, I am debugging a scenario where we see high number of Aerospike Java client timeout exceptions.
The client is a Spring Boot REST webservice running in EC2 instances (EBS). These exceptions appear more during the time when an EC2 instance is added to the cluster. The read latency is very high and thus more reads are exceeding the timeout threshold and this exception is thrown.

I wanted to check in this forum if the following are possible causes:

Does the Java client have a cache warming phase? I think this is unlikely, but wanted to check.
The REST service was writing and reading the data. To reduce the load, we have moved the write operation to a Spark job on AWS EMR that writes to Aerospike. I started seeing this issue after moving the write operations to this EMR Spark cluster. Could read latency be affected if a large dataset is added to Aerospike outside the Java client?

Please give any suggestions to tackle this. The namespace configuration is:

namespace t1 {
        replication-factor 2
        memory-size 25G
        high-water-memory-pct 70
        high-water-disk-pct 60
        default-ttl 4d 
        single-bin true
        partition-tree-sprigs 4096
        storage-engine memory
}

Here is the full stacktrace.

com.aerospike.client.AerospikeException$Timeout: Client timeout: timeout=30 iterations=1 lastNode=BB90B6F2699290E 10.1.99.205 3000
at com.aerospike.client.async.NettyCommand.totalTimeout(NettyCommand.java:513)
at com.aerospike.client.async.NettyCommand.timeout(NettyCommand.java:476)
at com.aerospike.client.async.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:146)
at com.aerospike.client.async.HashedWheelTimer$HashedWheelTimeout.access$700(HashedWheelTimer.java:125)
at com.aerospike.client.async.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:186)
at com.aerospike.client.async.HashedWheelTimer.run(HashedWheelTimer.java:118)
at com.aerospike.client.async.ScheduleTask.run(ScheduleTask.java:40)
at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)

Thanks for reading.

meher · November 18, 2021, 10:56pm

Here is some feedback on those questions:

1- The Java client does not have a warming phase. Your guess is correct. Having said that, when a node is added to a cluster, the first few transactions would require new connections to be created, which would add to the total time for a transaction to be processed. The recent versions of the Java client have an extra policy parameter for the connectionTimeout to exclude this extra time from the total timeout.

2- That is a pretty general question. Looking at the logs (specifically at the benchmark histograms for reads and writes) could provide some answers. But, in theory, having a higher workload of write transactions (or simply a different shape – meaning different number of connections, record sizes, etc…) could slow down server nodes and potentially impact read latencies.

shan.p · November 19, 2021, 6:15pm

Thanks for the responses.

The high Aerospike timeouts was linked to higher JVM memory settings. The EC2 instance type is m5a.2xlarge (8 vCPU, 32GB RAM). The JVM memory allocation was -Xms16g -Xmx16g -XX:MaxPermSize=1g. When this was increased to -Xms23g -Xmx23g -XX:MaxPermSize=1g there are higher number of AerospikeException$Timeout.

I am not yet sure why providing more RAM to the JVM has caused these errors. Any notes/observations by anyone who has seen a similar problem will be helpful.

system · November 19, 2022, 6:16pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aerospike Client timeout Error	6	5302	September 5, 2022
High latency issue on client.get Java Client java , amazon	1	2035	February 2, 2016
Frequent Aerospike Client Timeouts at 1000 RPS java , error , client	4	1993	July 31, 2022
Error in delay connects after system ready message Operations benchmark	11	3866	October 13, 2015
Getting Aerospike timeout when posting 1M records Java Client	3	3455	February 25, 2015

Aerospike Java client shows high read times when EC2 instance is created

Related topics