FAQ - Why does a cold start cause a surge in re-replication?

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

FAQ - Why does a cold start cause a surge in re-replication?

Detail

In a strongly-consistency enabled namespace, a cold start of multiple nodes seems to prompt a surge in re-replication behaviour. Why does this happen?

Answer

Re-replication happens when a write transaction fails during replication. In such case, an immediate attempt to re-replicate is made (asynchronously from the client transaction, which would receive a timeout error). If this initial re-replication also fails (network disruptions or replica nodes too slow to process), a subsequent access to the record will force a re-replication prior to processing the transaction if the record is not shown as ‘replicated’ in the primary index. The ‘replicated’ status indicates whether the latest version of this record has been written to all the replicas and they have acked back to the master to indicate success. Once this happens, the primary index entry for the master record is then updated to show that the replication has been successful.

It is also possible for this status to show ‘replicating’ and ‘unreplicated’ if a replication is in progress or the replication has failed. The client will not get a success back from the server until the replication has completed properly.

The replication status exists in the shared memory primary index entry for the record only. By the time the replicas have acknowledged success back to the master, on the master side the write will have been already written to the streaming write buffer (swb) and potentially even sent to the disk via the write queue. This means that if there is a cold start and the primary index is not available, as the records are read back from disk, their replication status will be lost. Normally, if only one node is cold-started, the replication status will be synced across from the other nodes. This is what is referred to as ‘appeals’. The appeals_tx_remaining statistic, along with a few others, track those.

Should more than 1 node be restarted with a cold-start in quick succession, the replication status of data partitions which only existed on those 2 nodes will be lost. As expected, a cautious approach is used here and so the records are marked as unreplicated. This means that they will be re-replicated when they are next read.

As a cold start of multiple nodes means that all records for which those nodes held both master and replica records would become unreplicated, a surge in re-replication activity is to be expected. This may happen gradually dependant on how soon the records are read.

Notes

  • Activities which do large scale reads such as XDR rewinds will cause mass re-replication in those specific situations.
  • Re-replication can be tracked by graphing the re_repl_success metric.
  • The following log line reports re-replication at 10 second intervals.

{ns_name} re-repl: all-triggers (525,0,32)

Keywords

COLD START RE-REPLICATION

Timestamp

June 2020