Data consistency and reliability in aerospike

Hi All,

I have few doubts related to aerospike policies. Suppose i have a cluster of 3 nodes with namespace’s configured with replication factor of 2 then

  1. If i am using WritePolicy.COMMIT_ALL it means client will be returned success only if master and replica partition copies the data successfully.
  2. Similarly if using ReadPolicy.CONSISTENCY_ALL it will ensure i get latest copy of the data.

And also aerospike client ensure if migrations are going on and i write something to the cluster it check the partition map and get redirected to the particular node or internally proxy it to correct node. SO when it can be possible that data can be inconsistent or how client updates its partition map during migration?

@kporter @Albot

I don’t think its necessary to use CONSISTENCY_ALL to ensure you get the latest copy, as long as you are reading the master record. If the client has a stale partition map and hits the wrong node it will either timeout (hit the dead node, retry), or it will be proxied to the master node. There should be very little chance of getting any inconsistent/stale reads. If your use-case is very sensitive you may want to explore running in CP mode. Strong Consistency mode | Aerospike Documentation.

This is a good question. As @Albot mentioned, if using the strong-consistency mode, you cannot get any inconsistency. You may get unavailability under some circumstances, though (specific network partition situations). Obviously, in such strong consistency mode, the COMMIT_ALL / CONSISTENCY_ALL are NOT configurable and what Aerospike calls ‘duplicate resolution’ is always enforced (only happens during migrations when there are cluster changes).

If you are not using the strong consistency mode, a network partition (split brain) could cause a subcluster to be missing specific partitions but will still allow the client to read/write against them (all partitions are available in any sub-cluster – other than some extreme edge cases like single node splits).

The ReadPolicy.CONSISTENCY_ALL name itself is therefore misleading. It was implemented before the strong consistency mode was developed. I therefore understand your confusion. The way to interpret it would be:

  • ReadPolicy.CONSISTENCY_ALL forces duplicate resolution and will make sure the latest version (according to the conflict-resolution-policy) is always read. This is helpful when nodes are restarted one at a time without waiting for migrations to complete in between nodes. If multiple nodes are shut down or if the cluster splits, then this policy on its own will not guarantee consistency (again this is only in non strong consistency mode).

Hope this helps a bit…

Thanks @meher @Albot for your answers. It means that even if i am not using strong consistency mode then there will be no inconsistent reads because in case of some changes or upgrade we always wait for migrations to complete and then start with next node. Right? Plus there is one more Question if i add a new namespace in one of the node with replication factor 2 but do not update config of other nodes in the cluster, will it behave correctly? Like as per my assumption witj replication factor it should work normally the only exception to it would be in case if that node goes down then namespace will be lost right?

Yes, if you do not have any network partition (or lose multiple nodes simultaneously), you would indeed not have any inconsistency issues. If you are waiting for migrations to complete between each node when doing a rolling restart, you shouldn’t even need to have the CONSISTENCY_ALL flag, but having it will actually make sure you will be reading the latest record even if you don’t wait for migrations to complete (make sure to review the conflict-resolution-policy to make sure it matches your requirements). Needless to say, without the strong-consistency mode, you of course expose yourself to consistency issues whenever a network partition occurs or more than replication-factor number of nodes unexpectedly shut down.

Regarding a different namespace on one node in the cluster, yes, yo got it right… if only one node has that namespace, that node will own all 4096 partitions for it and will be running with an effective replication factor of 1. It is not recommended to run as such as it can create imbalances across the cluster, but other than that, it will technically work as you described.

thanks @meher. Will check on that by replicating the same. Thanks again for great insights