If a client wants to query a record in Aerospike cluster with multiple nodes, does this client need to send this query to all nodes (basically N queries with N equal to the number of nodes), or maximum two queries (one query to one node, if not found on that node, this node will redirect client to the correct node with another query)
For straight read transaction (single or batch) the client will go straight to the node(s) owning the partition(s) the record(s) belong to… For the specific situations where a cluster is changing and partitions are moving around (migrating) those transactions will ‘proxy’ to the right node (without having to go back to the client – this is not a ‘redirect’).
For secondary index queries, the client will hit all the nodes as multiple results may be returned and a policy configuration dictates how many nodes at a time the client will send the transaction to (0 meaning all nodes simultaneously).
Thanks for some insights! So the client knows which node to access by primary key itself (some sort of consistent hashing algorithm I guess)? What happens to the batch get operation with multiple primary keys, does it only access necessary nodes or it will send this query to all nodes?
Yes, the clients know which node to go to based on the partition map they generate from periodically (every 1 second by default) asking each node in the cluster which partition they own. You can read some details on this page: Data Distribution | Aerospike Documentation .
For a batch get operation, the client will determine the partition each record in the batch belongs to and send separate batch records to the relevant nodes in the cluster… so, let’s say a batch of 10 records that somehow are part of 7 separate partitions in a 20 node cluster would be sent to 7 nodes in the cluster (in practice, 10 records would most likely be in 10 different partitions but using this just for the sake of the example).
Cool, thanks for the answer