I have 4 nodes cluster with replication factor 2. A client is multiprocess application, that connect to cluster in parent process and reads random data. When I shutdown one node, an application experiences few (~1 per worker) errors like: -1 - Bad file descriptor, -1 - Socket read error: 104
And then many “timeouts” like: 9 - Client timeout: timeout=3000 iterations=4 failedNodes=0 failedConns=0, but aerospike_key_get take only few ms. This bad state takes 2-3s (but so many requests) and then everything goes back to normal.
Can I force library to simply ask the replica of data instead of dead node? It has more than 2990ms spare time to do it
Server: CE 3.5.15, client: 3.1.20 I’m trying to configure client to be very defensive:
cfg.policies.timeout = 3000; cfg.policies.retry = 3; cfg.use_shm = true; cfg.shm_takeover_threshold_sec = 1;
Whole example client with log: https://gist.github.com/jarda-manana/237e57bf84111fc1a8c1