Running into an issue where whole cluster stopping reads and writes for a period of time (size of the cluster did not matter), when a single node in the cluster fails when running Async java client.
The issue observed from the async client on the java benchmark tool suspect is that this issue would be mitigated by specifying a timeout in the benchmark. Say somewhere between 100ms and 1 second. The interval down should be related to the amount you set the timeout to. Aerospike agrees that the client should behave better in this event and has added a work item to the queue to enhance the async client in the event of node failure without a timeout or a long timeout set.
./run_benchmarks -h 127.0.0.1 -p 3000 -n test -k 100000 -S 1 -o S:50 -T 1000 -w RU,50 -z 1 -async -asyncMaxCommands 1000 -asyncSelectorThreads 8 - h : host - p : port - k : number of keys - S : startkey - o : objectSpec - T : timeout - w : workload - z : threads - a : async - C : asyncMaxCommands - W : asyncSelectorThreads
So in the above script will run the benchmark on host 127.0.0.1 port 3000, on namespace test, with 100000 keys, starting key at 1, with string object, with a read and write 1000 timeouts, with a 50 percent read update, with 1 thread in async method, with 8 threads running in asynchronous mode.