We’re facing this issue since a long time now.
Whenever a node goes away (mostly when we remove an older node), we see lots of client errors until we restart all the client servers.
Error occurred while performing batch get in aerospike (9, 'Timeout: timeout=10000 iterations=1 failedNodes=0 failedConns=0', 'src/main/aerospike/as_command.c', 566)
I expect a few errors initially until clients refresh the server node IPs, but this keeps going on even after a couple of hours.
There should probably be a way to take a node out of server IPs list that goes to clients, so we can at least handle situations when we’re manually removing nodes.
Though a node going down due to hardware failure should also be handled. Otherwise what’s the point of having an HA cluster with multiple nodes?
Please suggest how we can deal with such situations for now.
Aerospike version: 126.96.36.199 CE