AerospikeException Timeout

index
java
query

#1

I have an application that users a lot of batch reads in Aerospike. After some time the application is running, I get a lot these errors:

play.api.Application$$anon$1: Execution exception[[AerospikeException: Error Code 9: Timeout]]
	at play.api.Application$class.handleError(Application.scala:296) ~[com.typesafe.play.play_2.11-2.3.7.jar:2.3.7]
	at play.api.DefaultApplication.handleError(Application.scala:402) [com.typesafe.play.play_2.11-2.3.7.jar:2.3.7]
	at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3$$anonfun$applyOrElse$4.apply(PlayDefaultUpstreamHandler.scala:320) [com.typesafe.play.play_2.11-2.3.7.jar:2.3.7]
	at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3$$anonfun$applyOrElse$4.apply(PlayDefaultUpstreamHandler.scala:320) [com.typesafe.play.play_2.11-2.3.7.jar:2.3.7]
	at scala.Option.map(Option.scala:146) [org.scala-lang.scala-library-2.11.7.jar:na]
	at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3.applyOrElse(PlayDefaultUpstreamHandler.scala:320) [com.typesafe.play.play_2.11-2.3.7.jar:2.3.7]
	at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3.applyOrElse(PlayDefaultUpstreamHandler.scala:316) [com.typesafe.play.play_2.11-2.3.7.jar:2.3.7]
	at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:344) [org.scala-lang.scala-library-2.11.7.jar:na]
	at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:343) [org.scala-lang.scala-library-2.11.7.jar:na]
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) [org.scala-lang.scala-library-2.11.7.jar:na]
	at play.api.libs.iteratee.Execution$trampoline$.execute(Execution.scala:46) [com.typesafe.play.play-iteratees_2.11-2.3.7.jar:2.3.7]
	at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) [org.scala-lang.scala-library-2.11.7.jar:na]
	at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) [org.scala-lang.scala-library-2.11.7.jar:na]
	at scala.concurrent.Promise$class.complete(Promise.scala:55) [org.scala-lang.scala-library-2.11.7.jar:na]
	at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153) [org.scala-lang.scala-library-2.11.7.jar:na]
	at scala.concurrent.Future$$anonfun$flatMap$1.apply(Future.scala:249) [org.scala-lang.scala-library-2.11.7.jar:na]
	at scala.concurrent.Future$$anonfun$flatMap$1.apply(Future.scala:249) [org.scala-lang.scala-library-2.11.7.jar:na]
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) [org.scala-lang.scala-library-2.11.7.jar:na]
	at scala.concurrent.forkjoin.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1361) [org.scala-lang.scala-library-2.11.7.jar:na]
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [org.scala-lang.scala-library-2.11.7.jar:na]
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [org.scala-lang.scala-library-2.11.7.jar:na]
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [org.scala-lang.scala-library-2.11.7.jar:na]
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [org.scala-lang.scala-library-2.11.7.jar:na]
Caused by: com.aerospike.client.AerospikeException: Error Code 9: Timeout
	at com.aerospike.client.command.MultiCommand.parseGroup(MultiCommand.java:96) ~[com.aerospike.aerospike-client-3.2.2.jar:na]
	at com.aerospike.client.command.MultiCommand.parseResult(MultiCommand.java:71) ~[com.aerospike.aerospike-client-3.2.2.jar:na]
	at com.aerospike.client.command.SyncCommand.execute(SyncCommand.java:57) ~[com.aerospike.aerospike-client-3.2.2.jar:na]
	at com.aerospike.client.command.Executor$ExecutorThread.run(Executor.java:127) ~[com.aerospike.aerospike-client-3.2.2.jar:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_66-internal]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_66-internal]
	at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_66-internal] 

I have a 5 nodes cluster, all of them are 2 CPUs, 3.5GB RAM each. Bellow there is a graph of the moment when the reads per second drops, which is exactly when the error starts happening.