This article covers the timeout and retry API’s available in Aerospike clients.
This article additionally covers the different fields that are seen when the Java client sees a timeout on a transaction.
Configurations available from Client side
The client’s policy configuration options relative to timeout and retries have been updated in version 4.0.0 (Java) with some further iterations in subsequent versions as well. This document describes the state as of version 4.1.1 of the Java client which should be consistent with the C (4.3.1) and C# (3.5.2) client released in late 2017.
Note: For TLS enabled cluster, Java sync clients versions 4.1.7 and above will honor the socket timeout. Older versions do not support specifying a socket timeout and will hang upon failing to establish a handshake with a cluster note.
For the latest Java client versions:
- This is the socket idle timeout. It controls how long of a gap can occur in between activity on the socket.
- This helps as there can be activity on a socket for a very long time with gaps that are just below this value.
- This configuration applies to both the Sync and Async Java clients.
- Retries will be applied as long as totalTimeout or maxRetries have not been reached.
- Absolute fixed upper limit time given for the transaction to complete before an exception is sent back.
- If reached, there would be no retries.
- This is available only on the Java ASYNC client.
- In situations where a transaction reaches totalTimeout, an error will be returned but the socket will not be closed until this delay is reached.
- This gives the advantage to potentially reuse the socket (by putting the connection back in the pool) if a response came back within this extra delay, even though a timeout was already sent to the client application.
- Note that the response from the server will anyways be thrown away, since we already responded to the client.
- In situations where retries are configured, this is the absolute max number of retries to attempt.
- The initial attempt is not counted as a retry.
- For write transactions, default is 0. This is because we do not recommend writes to be applied twice.
- For read transactions, default is 2 (which means that 3 total attempts would be made at most).
- For read transactions, this would also depend on the replica mode set. If set to default (sequence) - the first attempt would be against the master copy, and in case of a timeout or network error, the subsequent attempt will be against the next replica copy. Note that the two options available for sequence are ‘sequence’ or ‘master’.
- For C, even if replication factor is 3, reads will stop at first replica and come back to master copy.
- For Java / C#, reads will go to all the replica copies and then come back to the master copy.
- For write, in case of a connection error, it will have the same behavior as read. But in case of a socket timeout, it will stay on master as the default is not to retry at all.
- This is available only for the Sync Java client. The Async Java client will never sleep beteen retries.
- This configuration ensures that if a transaction is retried, there will be a sleep before it retries.
- If configured to 0, there will not be any sleep. The default is set 0.
For Java client versions prior to 4.0.0
To keep old timeout behavior, set socketTimeout equal to totalTimeout.
Description for the timeout error
For client versions 4.0.0 and above
The timeout exception also provides information whether it was a client or a server timeout (defaults to 1 sec configurable by
For Java client versions prior to 4.0.0
Exception in thread "main" com.aerospike.client.AerospikeException$Timeout: Client timeout: timeout=0 iterations=M failedNodes=N failedConns=X at com.aerospike.client.command.SyncCommand.execute(SyncCommand.java:131)
Timeouts can occur for the following reasons:
Client can’t connect by specified timeout (timeout=). Timeout of zero means that there is no timeout set.
Client does not receive response by specified timeout (timeout=).
Server times out the transaction during it’s own processing (default of 1 second if client doesn’t specify timeout). To investigate this, confirm that the server transaction latencies are not the bottleneck.
Client times out after M iterations of retries when there was no error due to a failed node or a failed connection.
Client can’t obtain a valid node after N retries (where retries are set from your client).
Client can’t obtain a valid connection after X retries. The retry count is usually the limiting factor, not the timeout value. The reasoning is that if you can’t get a connection after R retries, you never will, so just timeout early.
Example: In WritePolicy, if you have maxRetries(3), sleepBetweenRetries(500ms) and timeout(0), you will try the operation 3 times, with a 500ms wait between each try. If you have not been successful after 1.5 seconds you will get an exception.
Timeout trumps retries. If you set a time out of 50ms (rather than zero) and the operation has not completed in that time, you will get an exception regardless of the number of retries.
Parameters that can be configured on Server side:
Definition: How long to wait for success, in milliseconds before timing out a transaction. This parameter comes into effect when the client has not specified transaction timeout or totalTimeout.
The transaction-max-ms (or, if specified, the client set timeout) gets checked in 3 different places:
- when a transaction is picked up from the transaction queue.
- every 130ms when waiting in the rw-hash (see
- every 75ms when waiting in the proxy-hash (see
Definition: How long to wait for success, in milliseconds, before retrying a fabric transaction (typically a write prole or a duplicate resolution).
If a client specifies a totalTimeout of 5 seconds, assuming there are network issues preventing a write to be processed on its prole side, the fabric transaction would be retried up to 2 times, with an interval starting at 1 second (default
transaction-retry-ms) and doubled for every subsequent retry i.e as long as totalTimeout is not reached.
If totalTimeout is set to 0 by the client, then
transaction-max-ms will be honored in-place of totalTimeout in the above example.
Errors for which client does retry (if maxRetries configured) and for which it doesn’t:
During the send command, the client will retry for any error it receives (if maxRetries configured). Once it sends the command to server and gets response from the server, it retries (if maxRetries configured) for errors like:
The client will strictly not retry for the following errors:
Client’s IN_DOUBT state (for writes):
If the client is in doubt state, a flag is set to indicate that is possible that the write transaction may have completed even though an exception was generated. This is specific to timeout errors (AEROSPIKE_ERR_TIMEOUT) and client specific errors (and not based on server’s response).
- See here for more details on Incompatability changes: https://www.aerospike.com/docs/client/java/usage/incompatible.html
- Java API Reference: https://www.aerospike.com/apidocs/java/
- Java Client release notes: https://www.aerospike.com/download/client/java/notes.html
timeout retry socket retries