Does the Java client support automatically switching to other nodes when one node goes offline?
We are testing Aerospike with a 8 node K8s cluster (replication factor = 1) and when one node is down our client operations start to fail.
For example an aerospike key put fails with “Client timeout” -
reason: Client timeout: iteration=1 socket=30000 total=0 maxRetries=0 node=BB9844C8706B3C2 xx.xxx.xx.xxx 3000 inDoubt=true
If a node goes down, it’s inevitable that timeouts/connection errors occur because inflight transactions lost their connections during socket send/receive. Also, the partition map will continue to point to the downed node until the cluster tend retrieves the new server partition maps. This process should take a few seconds.
You can adjust your retry policy (depending whether the write operation is idempotent for example) or have the application handle the timeout / error. It is certainly not lost since the client application would have received an error.