How are missing keys load balanced across the aerospike cluster nodes?

vikrantl · August 30, 2021, 9:34am

Reference: https://docs.aerospike.com/docs/architecture/

Division is purely algorithmic. The system scales without a master and eliminates the need for additional configuration as required in a sharded environment.

Aerospike expert team,

I would like to study for a healthy aerospile cluster:

(A) on how the aerospike C client lib would divide (load balance) the ‘missing’ keys on the cluster nodes where it is a 8 node cluster with replication factor = 2 In case if a missing key, would the client attempt querying all the cluster nodes one by one?

(B) would there be chances of a miss for existing key; and would in that case, the query from the client would be internally transferred to the next cluster node)?

This is a generic question.

thanks all

-Vikrant

kporter · August 30, 2021, 4:30pm

If you haven’t yet, I’d recommend reading about data-distribution.

If the cluster isn’t configured with strong-consistency (AP mode):

The client maintains a map of partition to nodes owning that partition and sends requests accordingly.
- If the request arrives at a node with a sync’ed partition, then the node will find that the key is missing on that node and return not-found the the client.
- If the request arrives at a node that no longer owns the partition, then the node will proxy the request to the new master owner.
- If the request arrives at the master copy and finds that there are versions of the partition where it could find a different copy of the record and either read-consistency-level-override is set to all or the client has specified the read-consistency-level all in the transaction, then the node will read from the nodes holding those copies and if return not-found if all nodes potentially holding the record return not-found, otherwise the node will return the “best” copy based on the configured conflict-resolution-policy.

In AP mode, it is possible that not all online nodes are able to communicate with each other (partitioned network) and do not know a better version of the record exists on another node. In which case, the client can receive not-found when the record exists in part of the cluster that is currently inaccessible.

If the cluster is configured with strong-consistency enabled:

If the network is healthy enough for the encompassing partition to go live then a not-found read means that there cannot be a committed copy of the record in the cluster.
- This could mean that another node that is currently unreachable may have a record that hasn’t been committed. This uncommitted copy could become committed when the network recovers. This uncommitted copy is ensured to not commit if a new copy of the record is created while the node is inaccessible.
- If there are other possible committed or uncommitted records in the cluster then read and writes will always resolve and return the latest version available record - this version is guaranteed to be in the record’s causal history. This is a bit too simple, writes always use linearizable consistency, reads have additional consistency modes that can be configured on the transaction.
If the network is not healthy enough for the encompassing partition to go live then you will receive an unavailable error instead of not-found.

vikrantl · August 30, 2021, 5:28pm

Thanks kporter,

I will deep dive into this information the link you provided and get back.

Regards

-Vikrant

Topic		Replies	Views
Aerospike routing traffic Operations	16	956	July 24, 2019
Aearospike write/read would be failed through the load balancer Configuration query , secondary	5	2614	November 20, 2017
How does query work in Aerospike? query	5	299	November 9, 2023
Uneven partition distribution Operations	3	1746	June 13, 2019
Cluster synchronization: re-write keys Tuning	7	4685	August 18, 2014

How are missing keys load balanced across the aerospike cluster nodes?

Related topics