AerospikeException$Timeout while node is stopped

Sini · January 14, 2015, 4:10pm

I run an Aerospike cluster v3.4 with two nodes. The cluster contains only 19 records (this is just a test instance ). Using the Java Client API, I wrote a small application that queries all these data periodically:

Statement statement = new Statement();
statement.setNamespace("testNs");
statement.setSetName("testSet");
        RecordSet rs = client.query(null, statement);
        while (rs.next()) {} ...

This work fine usually, but if I stop one of the working nodes while running the query periodically, I will get an exception:

Exception in thread "main" com.aerospike.client.AerospikeException$Timeout: Client timeout: timeout=30000 iterations=26 failedNodes=0 failedConns=25
    	at com.aerospike.client.command.SyncCommand.execute(SyncCommand.java:131)
    	at com.aerospike.client.query.QueryExecutor$QueryThread.run(QueryExecutor.java:134)

I tried to prevent this exception, and set the policy settings to a higher timeout, more retries and so on but I could not adjust those settings in a way to prevent this Exception and just receive the query results (maybe much slower, but that is stilll ok).

Is it possible to set the policy so that the queries survive a stopped or failed node?

Configuration has mesh topology set up and only in-memory storage.

Sini · January 14, 2015, 4:12pm

I have also tried to run the query with the aql command line client, in this case I got

Error: (11) AEROSPIKE_ERR_CLUSTER

error messages.

Sini · January 16, 2015, 3:32pm

I ran some tests and as I can see, the query operation will until the policy’s timeout expires or if all the retries fail. If a node is down because of maintenance or an outage, this will more likely to happen and the AerospikeException will be thrown.

The only solution I see now is to cancel the application’s operation and simple retry the query again.

Is there any other may to workaround this behaviour and get the correct result from the query/rs.get calls?

Brian · January 19, 2015, 10:35pm

If a node goes down while the query is running, the query will fail because there is no retry on queries by default. The client will eject the downed node from it’s map within a second. After that, queries should work with the remaining nodes.

Topic		Replies	Views
Query fails during migration Java Client query	5	452	December 18, 2023
Handling node failure on client	4	3808	September 23, 2024
Java client return timeout once one of the nodes is down Client Libraries java	2	860	May 4, 2022
Aerospike Exception Operations	4	1198	August 10, 2017
How to setup timeout policy for AGGREGATE query? Python Client	2	1898	February 19, 2015

AerospikeException$Timeout while node is stopped

Related topics