FAQ - What is the cause of AEROSPIKE_ERR_RECORD_NOT_FOUND when migrating?
When a cluster is migrating, simple
get commands, either via AQL or any other client fail with
AEROSPIKE_ERR_RECORD_NOT_FOUND. What is the reason for this?
By default, in AP mode (non strong-consistency enabled namespace) reads will not duplicate resolve during migration and so unless the client or server is configured to do duplicate resolution on reads the
get command might return
AEROSPIKE_ERR_RECORD_NOT_FOUND for records that are actually in the cluster.
To explain this issue further, consider a cluster where a partition exists on node A as master and node B as replica. Node A is shutdown, node B becomes master and has a full copy of the partition, another node, node C is now the replica. A record,
X is written to the partition, the record exists on node B and node C. Before migrations are completed, node A returns to the cluster and node B is shut down. This is a typical scenario during a rolling restart. At that point neither node A nor node C has a full copy of the partition and so they have subset partitions. Node A has a partial copy of the partition and node C also has a partial copy but only node C has a copy of record
X. When both copies of the partition are subsets the node which is first in the succession list (left most node) becomes the master, in this case, node A. In default configuration the
get transaction will go to the master only. In this instance the master does not have the record and will return
AEROSPIKE_ERR_RECORD_NOT_FOUND. It is then evident that this is not a generalised issue during migration but an issue that can occur when a rolling restart is being conducted without waiting for migrations to complete between node restarts.
It should be noted that waiting for migrations to complete prior to shutting down node B would have caused node C to have a full copy of the partition. Node C would have then become the master node when node B was shutdown. This would avoid any duplicate resolution and prevent any
AEROSPIKE_ERR_RECORD_NOT_FOUND for records in the cluster.
- Duplicate resolution can be configured at either a server or client policy level.
- The server level configuration parameter is
read-consistency-level-override. It overrides the client policy configuration.
- The client policy control is called
- In both client and server side controls, the setting to resolve duplicates would be
AEROSPIKE_ERR_RECORD_NOT_FOUND READ DUPLICATE RESOLUTION MIGRATE