How are missing keys load balanced across the aerospike cluster nodes?

Reference: https://docs.aerospike.com/docs/architecture/

Division is purely algorithmic. The system scales without a master and eliminates the need for additional configuration as required in a sharded environment.

Aerospike expert team,

I would like to study for a healthy aerospile cluster:

(A) on how the aerospike C client lib would divide (load balance) the ‘missing’ keys on the cluster nodes where it is a 8 node cluster with replication factor = 2 In case if a missing key, would the client attempt querying all the cluster nodes one by one?

(B) would there be chances of a miss for existing key; and would in that case, the query from the client would be internally transferred to the next cluster node)?

This is a generic question.

thanks all

-Vikrant

If you haven’t yet, I’d recommend reading about data-distribution.

If the cluster isn’t configured with strong-consistency (AP mode):

  • The client maintains a map of partition to nodes owning that partition and sends requests accordingly.
    • If the request arrives at a node with a sync’ed partition, then the node will find that the key is missing on that node and return not-found the the client.
    • If the request arrives at a node that no longer owns the partition, then the node will proxy the request to the new master owner.
    • If the request arrives at the master copy and finds that there are versions of the partition where it could find a different copy of the record and either read-consistency-level-override is set to all or the client has specified the read-consistency-level all in the transaction, then the node will read from the nodes holding those copies and if return not-found if all nodes potentially holding the record return not-found, otherwise the node will return the “best” copy based on the configured conflict-resolution-policy.

In AP mode, it is possible that not all online nodes are able to communicate with each other (partitioned network) and do not know a better version of the record exists on another node. In which case, the client can receive not-found when the record exists in part of the cluster that is currently inaccessible.

If the cluster is configured with strong-consistency enabled:

  • If the network is healthy enough for the encompassing partition to go live then a not-found read means that there cannot be a committed copy of the record in the cluster.
    • This could mean that another node that is currently unreachable may have a record that hasn’t been committed. This uncommitted copy could become committed when the network recovers. This uncommitted copy is ensured to not commit if a new copy of the record is created while the node is inaccessible.
    • If there are other possible committed or uncommitted records in the cluster then read and writes will always resolve and return the latest version available record - this version is guaranteed to be in the record’s causal history. This is a bit too simple, writes always use linearizable consistency, reads have additional consistency modes that can be configured on the transaction.
  • If the network is not healthy enough for the encompassing partition to go live then you will receive an unavailable error instead of not-found.

Thanks kporter,

I will deep dive into this information the link you provided and get back.

Regards

-Vikrant