Aerospike Replica.RANDOM giving inconsistent result while migrations are in progress

query

#1

We have a 12 node setup with 12 replica, we are using ReadPolicy.replica = Random:

/**
	 * Distribute reads across all nodes in cluster in round-robin fashion.
	 * This option is useful when the replication factor equals the number
	 * of nodes in the cluster and the overhead of requesting proles is not desired.
	 */
	RANDOM;

This mode works fine when there are no Migration in cluster, but when there are some migration going on. We are getting inconsistent results, some time we get keyNotFound and sometimes we get the result.

There are no issues with MASTER and MASTER_PROLES mode during migrations.

The steps to reproduce is,

  1. create a cluster of n size with n replication. In my case its 10, make sure migration are not happening. Request data for any valid key using Aerospike Java client with Random replica mode.

  2. Now bring down on server (A) and verify the data for same key is returned.

  3. Bring back the server (A) you stopped in 1st step and notice that migration are happening.

  4. Request data for same key, you will see in consistent response.

  5. Stop the newly added node (A) again, you will immediately see consistent data.

I went deeper into the code and verified that while migrations are in progress, partition map containing data for each replica (0 the index being master info) has lot of null records for index > 1

AtomicReferenceArray[] replicaArray = map.get(partition.namespace);

Let me know if any other details are required.

My basic questions is when migration are going on does Aerospike switches to Master only reads for a good amount of time. If thats the case, Random mode is not reliable to use.

Anshul


#2

This is expected behavior during migrations. By default our reads to not resolve duplicates within the cluster but our writes do. So if you were to attempt to use the generation from your inconsistent read and perform a write with it, that write will fail because the generation will not match.

If you want to enforce a read to be the latest the cluster has to offer you will need to use a read policy with the consistencyLevel set to CONSISTENCY_ALL. More details on this can be found in this KB article:


#3

Hello @kporter, let me re-iterate the problem, when i said inconsistent reads I meant that though a given key had data earlier, when migration were kicked in as per step 4. We are observing null data for the key (i.e. record not found).

This is a major issue, I was okay with it returning a old version of record as by default consistencyLevel sets to CONSISTENCY_ONE. I have gone though the documentation you pointed to and even tried tweaking the consistencyLevel on server side read-consistency-level-override=one (default off)

Let me know if this explains the gravity of situation.

Anshul


#4

Sounds like you are using batch read requests. Prior to 3.6.0 (see AER-2903) batch requests were not designed to proxy to the correct node on not found. So during migration, batch requests would return Null values as you described here.

We have uncovered a race condition with the new batch in 3.6.0 and 3.6.1 that will cause a crash and we are currently tracking it down, so I wouldn’t suggest upgrading for the new batch feature at this time. In the mean time you could either issue single gets for all the “Not found” records or, if your batch size is small, issue single gets for the entire batch.


#5

Yes we are using batch read request feature.

Our latency requirement is very strict, so issuing single calls will take a major hit on downstream system throughput. We are going with an alternate approach for now, will watch for future release logs for a fix.

Thanks, Anshul