Java client return timeout once one of the nodes is down

Hey,

We are using Aerospike as a DB. On client side we are using Java client.

    <dependency>
        <groupId>com.aerospike</groupId>
        <artifactId>aerospike-client</artifactId>
        <version>4.4.14</version>
    </dependency>

We are facing a situation when one of the nodes is not available, the java client gets Timeout on Aersopike requests and cannot handle it properly.

How can I solve this behaviour? If one of the nodes is not available I would expect that the client will know about that and then will try to write/read from different node instead.

Any suggestions?

Thanks

You are absolutely right. That is the expected behavior. What do you mean by ‘a node is not available’? If it is still part of the cluster but somehow irresponsive, the client may still try to tend against it. Based on your policy, though you would retry reads against another replica, but if the node didn’t leave the cluster and is still somehow responding to heartbeat messages from other nodes, write transactions will not be able to proceed.

If the node did leave the cluster, then within a couple of seconds, the cluster should reform without that node, and the client side tending (default every 1 second) would pick up the updated partition map information and go to the correct node.

If your application is not recovering it either means the node is still somehow partially up and part of the cluster (but the client cannot reach it), or it has left the cluster properly but the client application is somehow not able to update its partition map… Further log analysis would be required to dig deeper.

Any tend errors on the client side? Do you confirm the cluster-size has updated?

© 2021 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.