We have a 12 node setup with 12 replica, we are using ReadPolicy.replica = Random:
/** * Distribute reads across all nodes in cluster in round-robin fashion. * This option is useful when the replication factor equals the number * of nodes in the cluster and the overhead of requesting proles is not desired. */ RANDOM;
This mode works fine when there are no Migration in cluster, but when there are some migration going on. We are getting inconsistent results, some time we get keyNotFound and sometimes we get the result.
There are no issues with MASTER and MASTER_PROLES mode during migrations.
The steps to reproduce is,
create a cluster of n size with n replication. In my case its 10, make sure migration are not happening. Request data for any valid key using Aerospike Java client with Random replica mode.
Now bring down on server (A) and verify the data for same key is returned.
Bring back the server (A) you stopped in 1st step and notice that migration are happening.
Request data for same key, you will see in consistent response.
Stop the newly added node (A) again, you will immediately see consistent data.
I went deeper into the code and verified that while migrations are in progress, partition map containing data for each replica (0 the index being master info) has lot of null records for index > 1
AtomicReferenceArray replicaArray = map.get(partition.namespace);
Let me know if any other details are required.
My basic questions is when migration are going on does Aerospike switches to Master only reads for a good amount of time. If thats the case, Random mode is not reliable to use.