This article covers the timeout and retry API’s available in Aerospike clients.
This article additionally covers the different fields that are seen when the Java client sees a timeout on a transaction.
Configurations available from Client side
The client’s policy configuration options relative to timeout and retries have been updated in version 4.0.0 (Java) with some further iterations in subsequent versions as well. This document describes the state as of version 4.1.1 of the Java client which should be consistent with the C (4.3.1) and C# (3.5.2) client released in late 2017.
Note: For TLS enabled cluster, Java sync clients versions 4.1.7 and above will honor the socket timeout. Older versions do not support specifying a socket timeout and will hang upon failing to establish a handshake with a cluster note.
For the latest Java client versions:
- This is the socket idle timeout. It controls how long of a gap can occur in between activity on the socket.
- This helps as there can be activity on a socket for a very long time with gaps that are just below this value.
- This configuration applies to both the Sync and Async Java clients.
- Retries will be applied as long as totalTimeout or maxRetries have not been reached.
- Absolute fixed upper limit time given for the transaction to complete before an exception is sent back.
- If reached, there would be no retries.
- This is available only on the Java ASYNC client.
- In situations where a transaction reaches totalTimeout, an error will be returned but the socket will not be closed until this delay is reached.
- This gives the advantage to potentially reuse the socket (by putting the connection back in the pool) if a response came back within this extra delay, even though a timeout was already sent to the client application.
- Note that the response from the server will anyways be thrown away since we already responded to the client.
- In situations where retries are configured, this is the absolute max number of retries to attempt.
- The initial attempt is not counted as a retry.
- For write transactions, the default is 0. This is because we do not recommend writes to be applied twice.
- For read transactions, the default is 2 (which means that 3 total attempts would be made at most).
- For read transactions, this would also depend on the replica mode set. If set to default (sequence) - the first attempt would be against the master copy, and in case of a timeout or network error, the subsequent attempt will be against the next replica copy. Note that the two options available for sequence are ‘sequence’ or ‘master’.
- For C, even if replication factor is 3, reads will stop at first replica and come back to master copy.
- For Java / C#, reads will go to all the replica copies and then come back to the master copy.
- For write, in case of a connection error, it will have the same behavior as read. But in case of a socket timeout, it will stay on master as the default is not to retry at all.
- This is available only for the Sync Java client. The Async Java client will never sleep between retries.
- This configuration ensures that if a transaction is retried, there will be a sleep before it retries.
- If configured to 0, there will not be any sleep. The default is set 0.
Assuming socketTimeout = 50ms, totalTimeout = 1s, maxRetries = 3, sleepBetweenRetries = 20ms Ones the transaction is initiated, let’s assume that there was no activity on the socket for the next 50ms. The socketTimeout would then trigger but since the total time taken is still below the totalTimeout configured, the client retries after waiting for 20ms(sleepBetweenRetries) and then initiates its first retry. Thus at this point of the first retry, the timeline has progressed by “50ms + 20ms = 70ms” since the start of the transaction. Similarly, if the socket timeout occurs again the next retry would occur at: “50ms + 20ms = 70ms” from the last retry i.e. at 140ms. Further, if the situation continues then the last i.e 3rd retry is attempted at 210ms since the total time taken would still be below the totalTimeout configured.
Assuming socketTimeout = 100ms, totalTimeout = 300ms, maxRetries = 3, sleepBetweenRetries = 100ms In this case, once the transaction is initiated, if there is no activity on the socket for 100ms, the first retry would be attempted after 200ms. The second retry is supposed to happen at 400ms but since totalTimeout is set to 300ms it is not attempted and the transaction will timeout after 1 retry (2 total attempts).
For Java client versions prior to 4.0.0
To keep old timeout behavior, set socketTimeout equal to totalTimeout.
Description for the timeout error
For client versions 4.0.0 and above
The timeout exception also provides information whether it was a client or a server timeout (defaults to 1 sec configurable by
For Java client versions prior to 4.0.0
Exception in thread "main" com.aerospike.client.AerospikeException$Timeout: Client timeout: timeout=0 iterations=M failedNodes=N failedConns=X at com.aerospike.client.command.SyncCommand.execute(SyncCommand.java:131)
Timeouts can occur for the following reasons:
Client can’t connect by specified timeout (timeout=). Timeout of zero means that there is no timeout set.
Client does not receive response by specified timeout (timeout=).
Server times out the transaction during it’s own processing (default of 1 second if client doesn’t specify timeout). To investigate this, confirm that the server transaction latencies are not the bottleneck.
Client times out after M iterations of retries when there was no error due to a failed node or a failed connection.
Client can’t obtain a valid node after N retries (where retries are set from your client).
Client can’t obtain a valid connection after X retries. The retry count is usually the limiting factor, not the timeout value. The reasoning is that if you can’t get a connection after R retries, you never will, so just timeout early.
In WritePolicy, if you have maxRetries(3), sleepBetweenRetries(500ms) and timeout(0), you will try the operation 4 times, with a 500ms wait between each try. If you have not been successful after 2 seconds you will get a timeout error back.
Timeout trumps retries. If you set a timeout of 50ms (rather than zero) and the operation has not completed in that time, you will get an exception regardless of the number of retries unless the retryOnTimeout is set to true
Consider timeout = 1s, sleepBetweenRetries = 300ms, retries = 3, retryOnTimeout = true. In this case, if the first transaction attempt times out, then 3 more attempts are made with 300ms between each retry. After all the retries if the transaction fails then a timeout is returned and it has to be handled at the application level.
Parameters that can be configured on Server side:
Definition: How long to wait for success, in milliseconds before timing out a transaction. This parameter comes into effect when the client has not specified transaction timeout or totalTimeout.
The transaction-max-ms (or, if specified, the client set timeout) gets checked in 3 different places:
- when a transaction is picked up from the transaction queue.
- every 130ms when waiting in the rw-hash (see
- every 75ms when waiting in the proxy-hash (see
Definition: How long to wait for success, in milliseconds, before retrying a fabric transaction (typically a write prole or a duplicate resolution).
If a client specifies a totalTimeout of 5 seconds, assuming there are network issues preventing a write to be processed on its prole side, the fabric transaction would be retried up to 2 times, with an interval starting at 1 second (default
transaction-retry-ms) and doubled for every subsequent retry i.e as long as totalTimeout is not reached.
If totalTimeout is set to 0 by the client, then
transaction-max-ms will be honored in-place of totalTimeout in the above example.
Errors for which client does retry (if maxRetries configured) and for which it doesn’t:
During the send command, the client will retry for any error it receives (if maxRetries configured). Once it sends the command to server and gets response from the server, it retries (if maxRetries configured) for errors like:
The client will strictly not retry for the following errors:
Client’s IN_DOUBT state (for writes):
If the client is in doubt state, a flag is set to indicate that is possible that the write transaction may have completed even though an exception was generated. This is specific to timeout errors (AEROSPIKE_ERR_TIMEOUT) and client specific errors (and not based on server’s response).
- See here for more details on Incompatability changes: https://www.aerospike.com/docs/client/java/usage/incompatible.html
- Java API Reference: https://www.aerospike.com/apidocs/java/
- Java Client release notes: https://www.aerospike.com/download/client/java/notes.html
timeout retry socket retries