XDR 5.0 ships entire namespace when starting up after DC addition

XDR 5.0 ships entire namespace when starting up after DC addition

Problem Description

When XDR 5.0 has been configured to ship for a given namespace or namespaces, if the Aerospike node is stopped to add another XDR destination (DC) and associate those DCs with namespaces, when the node restarts, it ships all records in that namespace to the new DC. The recoveries statistic will also have increased.

Explanation

This is the expected behaviour for version 5.0. XDR maintains a last ship time for each DC within the system metadata directory (SMD) of each node. This is necessary as from Aerospike 5.0 there is no file based logging and incoming digests are held in in-memory queues prior to shipment. On starting, the node checks the last ship time for each configured DC and ships anything later than that last ship time. When a new destination is added while the node is down, the node has no reference of that DC in the SMD. This means there is no last ship time for that DC and XDR will ship everything in the namespace. The reason that recoveries increases is that the shipping described here is done in recovery mode whereby XDR reduces a partition rather than shipping from entries in an in-memory queue. This partition reduction process may occur several times within a given recovery cycle and so it is not unexpected to see an increment of greater than 4096 listed for the recoveries statistic.

Solution

The crux of this behaviour is XDR starting and having a destination for which there is no last ship time held within the SMD.

The simple answer here is to add the DC dynamically, which is a feature first implemented in Aerospike 5.0. The command to add a new destination dynamically in XDR 5.0 is:

asinfo -v "set-config:context=xdr;dc=DC1;node-address-port=10.0.0.2:3000;action=add"

Namespaces that should be shipped to that DC are added as follows:

asinfo -v "set-config:context=xdr;dc=DC1;namespace=someNameSpaceName;action=add

It is good practise to add static configurations to the aerospike.conf file and conduct a rolling restart to verify the configurations have been made correctly. Given the scenario described above, it is prudent to configure the DC and associate it with namespaces and allow shipping to continue for 5 minutes so before initiating the rolling restart. This is because the last ship time is updated in the SMD across the cluster every 30s and so 5 minutes gives ample time for a last ship time to be associated with the newly configured DC.

Notes

  • When configuring a new XDR DC, one recurrent question is how to do the initial data load. On first reading, the behaviour above might seem ideal for this, this is not the case. When the first node comes up it will start shipping and will start sharing last ship time via SMD (every 30 seconds) for all DCs it ships to. For this reason when subsequent nodes start, they may find that there is a last ship time present for the newly added DC and will not ship everything they have. If a DC is to be populated from empty via XDR the correct way to do this is by using the new rewind function.
  • This aspect of XDR behaviour will change in future releases such that shipping will start from the current time when there is no entry for a DC in the SMD.
  • Full documentation on how to configure XDR in both static and dynamic modes can be found here.

Keywords

XDR 5.0 STATIC NAMESPACE SHIPPING EVERYTHING ADDITION

Timestamp

JUNE 2020

© 2015 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.