How do I handle a planned network maintenance between XDR source and destination?


#1

How do I handle a planned network maintenance between XDR source and destination?

Problem Description

In the event of XDR connectivity issues betweeen a source cluster (S) and one (among several) destination cluster (D1), the cross datacenter replication to the other destination clusters would potentially be impacted, causing the overall lag to increase.

Explanation

It’s important to understand and review the architecture of XDR before deploying in production.

XDR ships records in locksteps across all destination clusters. The failure to ship one record to a single destination will force the relog and a subsequent attempt to ship the record to all destination clusters. Not only do errors cause XDR to throttle, it also will unnecessarily re-ship relogged records to destination clusters that may have received it already. Also, a slow down in a link between the source cluster and one of its destination will slow down shipping to all destinations. In other words, XDR will be as slow as the slowest destination. Therefore, in case of temporary network slow downs (planned or not), or other issues impacting the normal shipment of records to a particular destination cluster, it may be necessary to drop such destination cluster in order to let XDR ship records to the other healthy destinations, until the issue is remediated to.

Solution

IMPORTANT NOTE: forcing a cluster into CLUSTER_DOWN state should always be carefully considered, as the digestlog would continue to grow in order to allow for entries to be processed when the cluster can receive records again:

  • As the digest log grows, the reclaim needs to search through more pages to find the global last ship time and move the start pointer. This happens every minute and will increase disk IO and slow down the digestlog reclamation process.
  • If a node at the source gets restarted while there is a lag, XDR starts processing entries from the start pointer even if it does not ship all of them, causing extra strain.
  • Forcing a destination cluster in CLUSTER_DOWN could cause stop_writes if the xdr-min-digestlog-free-pct has been configured.

We are assuming, in the following, that the unhealthy link is between S and D1, or that D1 is encountering some sort of temporary outage, other destinations being D2, D3, etc. The different options to handle this situation are as follows:

For versions 4.1 and above, the force-linkdown command can be used:

    asinfo -v "xdr-command:force-linkdown=true;dc=D1"

XDR will treat D1 in CLUSTER_DOWN state and will trigger window shipping[1] when “force-linkdown” is set back to false, ensuring that the records that were written at the source while D1 was in CLUSTER_DOWN state will be picked up by the window shipper thread.

For Versions prior to 4.1, you can use one of the following three methods:

1. Dynamically disassociate D1 from the remote destination for the relevant namespaces and un-seed all nodes from the source:

    asinfo -v "set-config:context=xdr;xdr-shipping-enabled=false"
    asinfo -v "set-config:context=namespace;id=<NAMESPACE>;xdr-remote-datacenter=D1;action=remove"
    asinfo -v "set-config:context=xdr;xdr-shipping-enabled=true"

Note: the suspend of shipping is for AER-5718. It is not neccessary if you are using the latest 3.13, 3.14 and 3.15+.

For newer releases, if no other namespace is associated with the DC, then the DC will be in INACTIVE state. For older releases, you will need to remove each node that was seeded:

    asinfo -v "set-config:context=xdr;dc=D1;dc-node-address-port=xx.xx.xx.xx:3000;action=remove"

This will trigger the source to treat the destination as INACTIVE and will not attempt to ship to it. The records written after D1 was taken down will be missing from D1 even after it comes back. State can be restored by backing up S and restoring to D1 after the cluster comes back up.

2. Bring down D1 by shutting down Aerospike service on all the cluster nodes.

XDR will treat D1 in CLUSTER_DOWN state and will trigger window shipping[1] when it comes back up, which will ensure that the records that were written at the source while D1 was down will be picked up by the window shipper thread when the cluster is restarted. The obvious drawback here is that the D1 cluser is also not available for other clients.

3. Use IPtables to cut the connection between S and D1.

Impact: XDR will treat D1 in CLUSTER_DOWN state and will trigger window shipping when the IPTable rules are reversed (after the outage of maintenance window), ensuring the records written between the IPTables enabling and disabling will eventually be shipped to D1.

Example rule for destination:

iptables -I INPUT -p tcp --dport 3000 -d 10.xxx.y.zz/32 -j REJECT
iptables -I INPUT -p tcp --dport 3000 -s 10.yyy.z.xx/32 -j REJECT

Block/unblock a node from joining a cluster

Note, you always want to use REJECT for a quick result. IPtables REJECT will actively respond to the caller with icmp-port-unreachable by default, allowing the host to immediately take action on an unreachable connection. If you use DROP, this will simulate 100% packet loss instead, as packets will simply be silently dropped. This may then take a while for the caller to realise that the host is down by the fact the connections are timing out with lack of response.

Reference

[1] More details on handling local node loss and remote node loss:

Keywords

XDR destination force link down DC

Timestamp

06/28/2018


How to identify a bad DC that cause XDR throttling
FAQ - How to disable XDR