Automatic failover to other nodes?


#1

Hi,

Does the C client support automatically switching to other nodes when one node goes offline? we are testing aerospike with a 3 node docker cluster (replication factor two) and when killing one node our client operations start to fail. For example an aerospike_key_put fails with AEROSPIKE_ERR_CLIENT (func: as_socket_read_limit, file: “src/main/aerospike/as_socket.c”, line: 442).


#2

The C client does support automatic node switching on node failures, but there is a lag (usually 1 - 2 seconds) between node failure and the client dropping that node from the cluster map. During this lag, transactions will continue to be sent to the downed node.

Each client instance periodically polls all nodes for cluster status at default 1 second intervals. When a node goes down, the next cluster status request should result in the node being dropped from the map. The client strictly follows this map when determining transaction destination.

Immediately switching nodes on a transaction timeout is bad idea for a number of reasons.

  1. The client wouldn’t know which node to send the transaction because the new node for that transaction hasn’t been decided yet. This would result in lots of proxies in an already stressed system.

  2. Timeouts can be relatively frequent for applications that must respond by a fixed time.

  3. The client’s view of the cluster map would operate much differently than the server’s cluster map.