When an XDR cluster ships records in normal state, it uses in-memory queues to track which records should be shipped. Records are added to those queues as they are updated by client application and are therefore naturally ordered by Last Update Time (LUT). In certain circumstances, these queues are dropped and all records after a given LUT are shipped by iterating through a partition. This is called recovery. Can migration within an Aerospike cluster trigger XDR recovery?
For a partition to get into recovery as a result of migrations depends on whether the in-memory queue can be “trusted” (to make sure no updates are missed). For a given partition, the node acting as master during the migration process would be shipping records for that partition until the migration completes (again, for that specific partition). Upon completion of the migration for that partition, the node taking master ownership would look at the Last Ship Time (LST) for the partition. If that LST is newer than the LUT of the first element in the queue, then the queue is trusted. At this point, normal shipping continues as the new master node can be sure that nothing is missing from the queue it has received.
If the queue is not trusted, meaning that the LST is not newer than the LUT of the partition, then a recovery is scheduled for that partition. The recovery is not scheduled until the migrations have completed.
If there is no write load and the XDR transaction queue is empty, a pessimistic approach is taken and a recovery is scheduled for when migrations are complete.
As recoveries will not be scheduled until migrations have completed, the
recoveries_pending will not show these recoveries until migrations complete (for the concerned partitions).
Therefore, it is completely normal to see a number of recoveries start as soon as migrations complete.
If a cluster has a strongly consistent namespace and a cluster event leads to partitions being unavailable, nodes holding those unavailable partitions will immediately drop their XDR transaction queues. When those unavailable partitions become available again they will schedule recoveries.
While partitions are unavailable they will not count against any calculated lag within the cluster. Lag is only calculated based on the last ship time of available partitions. Once the recoveries for a newly available partition start, the lag will include this partition’s state (in terms of last ship time).
When a recovery for a partition starts, the last ship time used is dependent on what has happened within the cluster. If sufficient nodes have left so that there is no node holding any data for the partition, then the global last ship time from the SMD is used as a basis for recovery. If there was at least one node with some data for the partition remaining in the cluster, even if that was not sufficient for the partition to be available, it will retain the partition level last ship time and once recovery starts it is that last ship time that will be used.
- The LST for a given partition is shared among nodes using the fabric connections on which nodes exchange data. LST is also stored on a per-namespace per-DC basis via fabric and persisted in the SMD files.
- When a node re-joins a cluster, it typically would not immediately take master ownership of any partitions (this article does not cover other situations where a node would immediately take master ownership of a partition upon joining back a cluster).
- If nodes are quiesced during a maintenance activity, they immediately give up ownership of partitions. They also drop any XDR transaction queues.
- The system differentiates between a queue that empty because it was dropped and a queue that has been newly initialised or emptied as soon as it was initialised. The latter is trusted, the former is not.
- Dynamic XDR information can be displayed using the following command:
asinfo -v "get-stats:context=xdr;dc=DC1;namespace=NS1.
Server 5.0 and later
XDR TRANSACTION MIGRATION RECOVERY