FAQ - What is the cause of AEROSPIKE_ERR_RECORD_NOT_FOUND when migrating?

FAQ - What is the cause of AEROSPIKE_ERR_RECORD_NOT_FOUND when migrating?

Detail

When a cluster is migrating, simple get commands, either via AQL or any other client fail with AEROSPIKE_ERR_RECORD_NOT_FOUND. What is the reason for this?

Answer

By default, in AP mode (non strong-consistency enabled namespace) reads will not duplicate resolve during migration and so unless the client or server is configured to do duplicate resolution on reads the get command might return AEROSPIKE_ERR_RECORD_NOT_FOUND for records that are actually in the cluster.

To explain this issue further, consider a cluster where a partition exists on node A as master and node B as replica. Node A is shutdown, node B becomes master and has a full copy of the partition, another node, node C is now the replica. A record, X is written to the partition, the record exists on node B and node C. Before migrations are completed, node A returns to the cluster and node B is shut down. This is a typical scenario during a rolling restart. At that point neither node A nor node C has a full copy of the partition and so they have subset partitions. Node A has a partial copy of the partition and node C also has a partial copy but only node C has a copy of record X. When both copies of the partition are subsets the node which is first in the succession list (left most node) becomes the master, in this case, node A. In default configuration the get transaction will go to the master only. In this instance the master does not have the record and will return AEROSPIKE_ERR_RECORD_NOT_FOUND. It is then evident that this is not a generalised issue during migration but an issue that can occur when a rolling restart is being conducted without waiting for migrations to complete between node restarts.

It should be noted that waiting for migrations to complete prior to shutting down node B would have caused node C to have a full copy of the partition. Node C would have then become the master node when node B was shutdown. This would avoid any duplicate resolution and prevent any AEROSPIKE_ERR_RECORD_NOT_FOUND for records in the cluster.

Notes

  • Duplicate resolution can be configured at either a server or client policy level.
  • The server level configuration parameter is read-consistency-level-override. It overrides the client policy configuration.
  • The client policy control is called data-consistency-level.
  • In both client and server side controls, the setting to resolve duplicates would be ALL.

Keywords

AEROSPIKE_ERR_RECORD_NOT_FOUND READ DUPLICATE RESOLUTION MIGRATE

Timestamp

March 2020

© 2015 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.