Understanding Timeout and Retry policies

Aerospike_Knowledge · April 22, 2016, 9:05pm

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

Summary

This article covers the timeout and retry API’s available in Aerospike clients.

This article additionally covers the different fields that are seen when the Java client sees a timeout on a transaction.

Configurations available from Client side

The client’s policy configuration options relative to timeout and retries have been updated in version 4.0.0 (Java) with some further iterations in subsequent versions as well. This document describes the state as of version 4.1.1 of the Java client which should be consistent with the C (4.3.1) and C# (3.5.2) client released in late 2017.

Note: For TLS enabled cluster, Java sync clients versions 4.1.7 and above will honor the socket timeout. Older versions do not support specifying a socket timeout and will hang upon failing to establish a handshake with a cluster note.

For the latest Java client versions:

socketTimeout

This is the socket idle timeout. It controls how long of a gap can occur in between activity on the socket.
This helps as there can be activity on a socket for a very long time with gaps that are just below this value.
This configuration applies to both the Sync and Async Java clients.
Retries will be applied as long as totalTimeout or maxRetries have not been reached.
The default is 30 seconds.

totalTimeout

Absolute fixed upper limit time given for the transaction to complete before an exception is sent back.
If reached, there would be no retries.
The default is 0.

timeoutDelay

This is available in the Java SYNC client only as of version 4.2.2 (otherwise available in the Java ASYNC client).
In situations where a transaction reaches totalTimeout, an error will be returned but the socket will not be closed until this delay is reached.
This gives the advantage to potentially reuse the socket (by putting the connection back in the pool) if a response came back within this extra delay, even though a timeout was already sent to the client application.
Note that the response from the server will anyways be thrown away since we already responded to the client.
timeoutDelay may provide more benefit for write transactions. As a write transaction timeout would be checked at 130ms intervals while in the rw-hash, a value around 150ms would typically make sense, if possible, to avoid having the connections churn.

maxRetries

In situations where retries are configured, this is the absolute max number of retries to attempt.
The initial attempt is not counted as a retry.
For write transactions, the default is 0. This is because we do not recommend writes to be applied twice.
For read transactions, the default is 2 (which means that 3 total attempts would be made at most).
For read transactions, this would also depend on the replica mode set. If set to default (sequence) - the first attempt would be against the master copy, and in case of a timeout or network error, the subsequent attempt will be against the next replica copy. Note that the two options available for sequence are ‘sequence’ or ‘master’.
For C, even if replication factor is 3, reads will stop at first replica and come back to master copy.
For Java / C#, reads will go to all the replica copies and then come back to the master copy.
For write, in case of a connection error, it will have the same behavior as read. But in case of a socket timeout, it will stay on master as the default is not to retry at all.

sleepBetweenRetries

This is available only for the Sync Java client. The Async Java client will never sleep between retries.
This configuration ensures that if a transaction is retried, there will be a sleep before it retries.
If configured to 0, there will not be any sleep.
The default is 0.

connectTimeout

This is available for the Java client since version 5.0.3.
If connectTimeout is greater than zero, it will be applied to creating a connection plus optional user authentication and TLS handshake. When the connect completes, socketTimeout/totalTimeout is then applied. In this case, totalTimeout starts after the connection completes. This is done to be consistent with transactions that pull connections from the pool instead of creating new connections.
ConnectTimeout is useful when new connection creation is expensive (for example for TLS connections) and it is acceptable to allow extra time to create a new connection compared to using an existing connection from the pool.
If configured to 0, the socketTimeout will be used and there will not be any additional time reserved for potentially establishing a new connection.
The default is 0.

Examples

Assuming socketTimeout = 50ms, totalTimeout = 1s, maxRetries = 3, sleepBetweenRetries = 20ms Ones the transaction is initiated, let’s assume that there was no activity on the socket for the next 50ms. The socketTimeout would then trigger but since the total time taken is still below the totalTimeout configured, the client retries after waiting for 20ms(sleepBetweenRetries) and then initiates its first retry. Thus at this point of the first retry, the timeline has progressed by “50ms + 20ms = 70ms” since the start of the transaction. Similarly, if the socket timeout occurs again the next retry would occur at: “50ms + 20ms = 70ms” from the last retry i.e. at 140ms. Further, if the situation continues then the last i.e 3rd retry is attempted at 210ms since the total time taken would still be below the totalTimeout configured.
Assuming socketTimeout = 100ms, totalTimeout = 300ms, maxRetries = 3, sleepBetweenRetries = 100ms In this case, once the transaction is initiated, if there is no activity on the socket for 100ms, the first retry would be attempted after 200ms. The second retry is supposed to happen at 400ms but since totalTimeout is set to 300ms it is not attempted and the transaction will timeout after 1 retry (2 total attempts).

For Java client versions prior to 4.0.0

To keep old timeout behavior, set socketTimeout equal to totalTimeout.

Description for the timeout error

For client versions 4.0.0 and above

The timeout exception also provides information whether it was a client or a server timeout (defaults to 1 sec configurable by transaction-max-ms).

For Java client versions prior to 4.0.0

Exception in thread "main" com.aerospike.client.AerospikeException$Timeout: Client timeout: timeout=0 iterations=M failedNodes=N failedConns=X at com.aerospike.client.command.SyncCommand.execute(SyncCommand.java:131)

Timeouts can occur for the following reasons:

Client can’t connect by specified timeout (timeout=). Timeout of zero means that there is no timeout set.
Client does not receive response by specified timeout (timeout=).
Server times out the transaction during it’s own processing (default of 1 second if client doesn’t specify timeout). To investigate this, confirm that the server transaction latencies are not the bottleneck.
Client times out after M iterations of retries when there was no error due to a failed node or a failed connection.
Client can’t obtain a valid node after N retries (where retries are set from your client).
Client can’t obtain a valid connection after X retries. The retry count is usually the limiting factor, not the timeout value. The reasoning is that if you can’t get a connection after R retries, you never will, so just timeout early.

Examples

In WritePolicy, if you have maxRetries(3), sleepBetweenRetries(500ms) and timeout(0), you will try the operation 4 times, with a 500ms wait between each try. If you have not been successful after 2 seconds you will get a timeout error back.
Timeout trumps retries. If you set a timeout of 50ms (rather than zero) and the operation has not completed in that time, you will get an exception regardless of the number of retries unless the retryOnTimeout is set to true
Consider timeout = 1s, sleepBetweenRetries = 300ms, retries = 3, retryOnTimeout = true. In this case, if the first transaction attempt times out, then 3 more attempts are made with 300ms between each retry. After all the retries if the transaction fails then a timeout is returned and it has to be handled at the application level.

Parameters that can be configured on Server side:

transaction-max-ms – transaction-max-ms

Definition: How long to wait for success, in milliseconds before timing out a transaction. This parameter comes into effect when the client has not specified transaction timeout or totalTimeout.

The transaction-max-ms (or, if specified, the client set timeout) gets checked in 3 different places:

when a transaction is picked up from the transaction queue.
every 130ms when waiting in the rw-hash (see rw_in_progress).
every 75ms when waiting in the proxy-hash (see proxy_in_progress)

transaction-retry-ms – transaction-retry-ms

Definition: How long to wait for success, in milliseconds, before retrying a fabric transaction (typically a write prole or a duplicate resolution).

Examples

If a client specifies a totalTimeout of 5 seconds, assuming there are network issues preventing a write to be processed on its prole side, the fabric transaction would be retried up to 2 times, with an interval starting at 1 second (default transaction-retry-ms) and doubled for every subsequent retry i.e as long as totalTimeout is not reached. If totalTimeout is set to 0 by the client, then transaction-max-ms will be honored in-place of totalTimeout in the above example.

Errors for which all java versions clients will retry (if maxRetries configured) and for which it will not:

During the send command, the client will retry for any network related error (timeouts, connections errors) it receives, (if maxRetries configured). Clients will not retry for server related errors (error codes) except for timeouts errors.

Once it sends the command to server and gets response from the server, it retries (if maxRetries configured) for errors like:

socket_timeout
AEROSPIKE_ERR_CONNECTION
AEROSPIKE_ERR_TIMEOUT

The client will strictly not retry for the following errors:

AEROSPIKE_ERR_RECORD_BUSY
AEROSPIKE_ERR_FAIL_FORBIDDEN
AEROSPIKE_NOT_AUTHENTICATED
AEROSPIKE_ERR_TLS_ERROR
AEROSPIKE_ERR_QUERY_ABORTED
AEROSPIKE_ERR_SCAN_ABORTED
AEROSPIKE_ERR_CLIENT_ABORT
AEROSPIKE_ERR_CLIENT

Client’s IN_DOUBT state (for writes):

If the client is in doubt state, a flag is set to indicate that is possible that the write transaction may have completed even though an exception was generated. This is specific to timeout errors (AEROSPIKE_ERR_TIMEOUT) and client specific errors (and not based on server’s response).

Notes

See here for more details on Incompatability changes: https://www.aerospike.com/docs/client/java/usage/incompatible.html
Java API Reference: https://www.aerospike.com/apidocs/java/
Java Client release notes: Java Client Library Release Note | Download | Aerospike

Keywords

TIMEOUT RETRY RETRIES SOCKET

Timestamp

August 2021

thanhnd44 · June 11, 2018, 3:41pm

Hi,

I have one question.

Of all timeout scenarios that you mentioned above which circumstances could I be absolute certain that the final result of the transaction is FAILED?

In the worst case, If I could’t be certain about the final result, how would I be able to query the final state of the transaction?

I tried searching many places but haven’t found an appropriate answer.

Thanks in advance.

thanhnd44 · June 11, 2018, 4:04pm

Edit: We came up with a temporary solution:

Keep a map of [generation → value read] for that record (maybe a background thread constantly reading the record etc.) and then on timeouts, we would periodically check the map (key = the generation expected) to see if the true written value is actually the one put to the map. If they are the same, it means the write succeeded, otherwise it means the write failed.

Do you guys think it’s necessary to do this? Or are there other ways?

Thanks.

meher · June 11, 2018, 4:38pm

Posted on Stack Overflow as well. If you are an Enterprise Licensee, do not hesitate to reach out (or have someone reach out) through support so we can help you with the details for your specific use case.

thanhnd44 · June 11, 2018, 4:57pm

Thanks a lot. We are currently experimenting the database and will consider switching to the Enterprise version once we’ve finished our experimentation.

kporter · June 11, 2018, 5:52pm

I previously described a solution here:

Note that this solution is specific to a counter, if you need read/modify/write then change step 2 (getHeader) to retrieve the full record.

Topic		Replies	Views
Frequent Aerospike Client Timeouts at 1000 RPS java , error , client	4	1672	July 31, 2022
Java Client write timeout exception Java Client	9	6797	March 16, 2015
How to set the ClientPolicy Java Client	0	1340	August 16, 2014
Possible reasons for error code -8 Java Client	8	4306	May 31, 2022
Async Client Timeout Exception Java Client	1	1606	June 30, 2015

Understanding Timeout and Retry policies

Summary

Configurations available from Client side

For the latest Java client versions:

Examples

For Java client versions prior to 4.0.0

Description for the timeout error

For client versions 4.0.0 and above

For Java client versions prior to 4.0.0

Examples

Parameters that can be configured on Server side:

Errors for which all java versions clients will retry (if maxRetries configured) and for which it will not:

Client’s IN_DOUBT state (for writes):

Notes

Keywords

Timestamp

Related Topics