FAQ - Some common questions on XDR


#1

Summary

XDR (Cross-Datacenter Replication) is a feature of the Enterprise edition of Aerospike which allows one cluster to synchronize with another cluster over a longer delay link. This covers some FAQ on XDR management. For more information on XDR, please see XDR Architecture.

1) How can we detect XDR conflicts or failure in case of Active-Active setup where both source and destination are receiving writes?

Unfortunately, there are too many variables in this scenario and there currently is no way to 100% reliably detect that a conflict happened. So, we suggest a datamodel level workaround for people who are worried by suggesting them to keep the records in two different sets/namespaces, each DC writes to its own set/ns. This will avoid real conflict. However, the user would need to read from both the records and merge them using their business logic as and when needed.

2) If I see generation count to be the same in both the source cluster and destination cluster, who wins: XDR or local cluster?

XDR does not check the generation of records when writing at a destination. Older versions of XDR (prior to 3.8) had a configuration which would force XDR writes with generation check (xdr-forward-with-gencheck when set to true).

However, there were complications with using the gencheck in some scenarios including cases with deletes and re-creation of records, therefore this configuration option was deprecated since 3.8.1

So, whenever XDR writes to it’s destination, it will overwrite the destination copy (updating the meta-data of the record like TTL, generation number, etc).

Also, XDR version 3.8.3 and above supports a configuration to only ship the bins that have been updated. See xdr-ship-bins for more details.

3) Does XDR preserves the generation of records that are shipped?

The generation of records is independent on each cluster. Here are a couple of examples to illustrate situations where the generation will not match between a source and the destination:

  1. XDR has a hotkey de-duplication mechanism (refer to xdr-hotkey-time-ms for details). Therefore, multiple update at a source cluster (multiple generation increments) could result in much fewer updates at the destination (fewer generation increments).

  2. XDR may not have been setup initially on a source cluster. In such situation, the first time a record is processed and shipped from the source cluster (for example with a generation greater than 1), it will be written up with a generation of 1 on the destination cluster (assuming the record didn’t exist there previously).

  3. In some edge cases, if XDR times-out while processing a record, the timeout could occur ‘on the way back’ or after the transaction has been successfully applied at destination side. In such cases, the transaction will be relogged and retried, resulting in an extra generation increment at the destination compared to the source.

4) What does XDR do at the point of failure of a node?

No impact is seen on the source or the destination cluster in the event of a failure of one of the node. In the event of a node failure on the destination, XDR automatically identifies the remaining nodes and continues to ship data. The data is rebalanced once the node comes back. See here for more details: How does XDR handle local node loss and remote destination node loss?

5) What does XDR during migration? Is it flawless?

There is no known impact on XDR when the source or the destination is undergoing migration. If there is a lag and data has migrated to a different node by the time the digest log is processed, the digestlog entry will be relogged to the node currently owning the record. For XDR versions prior to 3.8, for one-off situations when there is a huge lag built up and the cluster goes into migration state, you do have the option of switching to single-read XDR mode to be able to proxy to get the data shipped as expected. This is because by default XDR uses batch read which as of now does not do proxy during migration.

6) How to size the digest log?

A digest log entry is 80 bytes, and the digest log will hold both master and prole records (the prole records will be shipped in case of a node failure at the XDR source cluster). The digest log size will have to be determined based on the expected throughput across the cluster. When the digest log gets full, oldest entries will be overwritten and may hence not be shipped if they do not occur further in the digest log.

7) The digest log is sitting on the root fs on an EBS drive that is 80GB in size. Will there be any performance issues regarding IOPS or something else that can harm the cluster/XDR? Will it be better to put another drive for digest log?

Reads and writes to the digestlog are done by batches of 100, so every transaction does not result in an io access, but at high throughput there will still be potentially high io activity to the disk where the digestlog is persisted, despite the advantages of the file system cache. It is good practice, for high throughput workloads to monitor the system key metrics, and, if the digest log io is a potential concern to specifically check the iostat for it.

8) How do I estimate my monthly XDR traffic?

You can use the xdr_ship_bytes metric and calculate total shipped in 30 days by taking two snapshots of the metric from all nodes at the beginning and end of the month (or on a daily basis and extrapolate). This can also be done on a per DC basis uisng dc_ship_bytes.

9) How do I get the number writes on a cluster that are initiated by an XDR client, apart from the actual application writes?

The TPS you see on the cluster represents all client writes which includes writes originating from an XDR client if any. So if a cluster is an XDR destination and clients are also writing to it then the write TPS you see includes both XDR and client writes.

If you want to differentiate between those, refer to the following 2 statistics:

10) How do I specify which records to ship?

As of the writing of this article, it is not possible to specify which records to ship at the record level. Sets can be used though in order to ship only specific records. Refer to the following article for more details: XDR Set Replication Setup.

11) Does the source cluster nodes and destination cluster nodes connect to each other n-n? What happens if one destination node cannot be reached due to connectivity/firewall reasons?

XDR is essentially a normal client. If one of the node at the destination goes down, it will trigger a reclustering and as per the new partition map, the xdr client will ship the records to the resulting cluster. There is no one-to-one correspondence between nodes on the local cluster and nodes on the remote cluster. Hence, every master node of the local cluster may write to any remote cluster node. Just like any other client, XDR writes a record to the remote master node for that record.

We can see the following errors/warnings on the source side while shipping when a destination node goes down,

Mar 15 2018 09:21:24 GMT: WARNING (xdr): (xdr_ship.c:825) Marking cluster DC2 as suspicious because of a connection error.
Mar 15 2018 09:21:26 GMT: WARNING (xdr): (as_cluster.c:560) Node FA1BCEAAC270008 peers refresh failed: AEROSPIKE_ERR_CONNECTION Bad file descriptor
Mar 15 2018 09:21:26 GMT: INFO (xdr): (as_cluster.c:354) Remove node FA1BCEAAC270008 10.0.0.1:3000

XDR will successfully ship to the newly reformed cluster, just like a normal client. If a destination node is unreachable due to connectivity/firewall issues, the writes/ships for that node (master for that record) will timeout and will be relogged in the digest log. In this case, the following warnings/errors on the source cluster can be observed and XDR will throttle for a 30 seconds period:

Mar 15 2018 10:19:48 GMT: WARNING (xdr): (xdr_ship.c:844) Marking cluster DC2 as suspicious because of a timeout.
Mar 15 2018 10:19:49 GMT: INFO (xdr): (as_cluster.c:536) Node 13898BE4CA005452 refresh failed: AEROSPIKE_ERR_TIMEOUT 

12) Which records will be shipped by XDR?

At the XDR startup, the local cluster discovers namespaces that must be replicated to the remote cluster. XDR ships every writes that happens after it is set-up and enabled. Any writes that happened prior to setting up xdr will not be shipped.

Once XDR has been enabled and configured, it will keep track of any writes while a destination cluster potentially goes offline and then back online (through the link down handler / window shipper thread), but it will not be able to replicate records that are not touched/written/updated after before it has been initially setup.

13) Whats the recommended way to populate a new cluster with existing data (prior to xdr setup) of a different DC?

To populate a new cluster with records that were written prior to setting XDR up, refer to the following article:

14) Can you use XDR to ship from one namespace to another namespace of the remote DC?

No, it is not possible at this time. You can pre-populate data using asrestore which can allow a different namespace.

15) Can you ship from one cluster to another cluster with different version or different hardware?

XDR is just a client running within the server code and writing to remote destination clusters. You can have different hardware (cloud or baremetal) or versions. For AWS to non-AWS, you may have to consider the transfer cost.

16) Are there any metrics to show how many records were written by XDR on a specific destination DC for a specific namespace?

There are no such direct metrics from a source XDR node, but there is a statistic on a node receiving XDR traffic. The xdr_write_success statistic is the number of records which are written by XDR on a namespace on a node.

Keywords

XDR TTL GENERATION SOURCE DESTINATION

Timestamp

01/08/2018


Merging Bins during XDR?