Reference:
https://docs.aerospike.com/docs/architecture/
Division is purely algorithmic.
The system scales without a master and
eliminates the need for additional configuration
as required in a sharded environment.
Aerospike expert team,
I would like to study for a healthy aerospile cluster:
(A) on how the aerospike C client lib would divide (load balance) the ‘missing’ keys on the cluster nodes where it is a 8 node cluster with replication factor = 2
In case if a missing key, would the client attempt querying all the cluster nodes one by one?
(B) would there be chances of a miss for existing key;
and would in that case, the query from the client would be internally transferred to the next cluster node)?
This is a generic question.
thanks all
-Vikrant
If you haven’t yet, I’d recommend reading about data-distribution.
If the cluster isn’t configured with strong-consistency
(AP mode):
- The client maintains a map of partition to nodes owning that partition and sends requests accordingly.
- If the request arrives at a node with a sync’ed partition, then the node will find that the key is missing on that node and return not-found the the client.
- If the request arrives at a node that no longer owns the partition, then the node will proxy the request to the new master owner.
- If the request arrives at the master copy and finds that there are versions of the partition where it could find a different copy of the record and either
read-consistency-level-override
is set to all or the client has specified the read-consistency-level
all
in the transaction, then the node will read from the nodes holding those copies and if return not-found if all nodes potentially holding the record return not-found, otherwise the node will return the “best” copy based on the configured conflict-resolution-policy
.
In AP mode, it is possible that not all online nodes are able to communicate with each other (partitioned network) and do not know a better version of the record exists on another node. In which case, the client can receive not-found
when the record exists in part of the cluster that is currently inaccessible.
If the cluster is configured with strong-consistency
enabled:
- If the network is healthy enough for the encompassing partition to go live then a
not-found
read means that there cannot be a committed copy of the record in the cluster.
- This could mean that another node that is currently unreachable may have a record that hasn’t been committed. This uncommitted copy could become committed when the network recovers. This uncommitted copy is ensured to not commit if a new copy of the record is created while the node is inaccessible.
- If there are other possible committed or uncommitted records in the cluster then read and writes will always resolve and return the latest version available record - this version is guaranteed to be in the record’s causal history. This is a bit too simple, writes always use linearizable consistency, reads have additional consistency modes that can be configured on the transaction.
- If the network is not healthy enough for the encompassing partition to go live then you will receive an
unavailable
error instead of not-found
.
Thanks kporter,
I will deep dive into this information the link you provided and get back.
Regards
-Vikrant