How does XDR handle local node loss and remote destination node loss?

Aerospike_Knowledge · July 29, 2016, 10:28pm

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

Title

This knowledgebase covers the situation handled by XDR in the following two situations:

A local node loss or an Aerospike service restart at the source cluster
A remote destination cluster is not reachable

Abstract

XDR (Cross-Datacenter Replication) aims to replicate data between the configured source and destination across geographical zones. It aims to keep the source and destination in sync to provide, for example, for disaster recovery, or for bringing data close to where it will be consumed.

There are 2 failure situations XDR needs to handle:

Failed Node Processing - For a local node failure at the source cluster, its neighbours take the reign of shipping the records on behalf of this failed node. Each failed node gets its own failed node shipper thread on the remaining nodes in the cluster.
Window Shipping Processing - For a remote destination that becomes unreachable, XDR keeps track of the window for which the destination cluster is not reachablebe. When it is reachable again, XDR picks up shipping for that window. Each failed datacenter spawns a window shipper thread.

Details

This covers the log trails that will be seen on the nodes during the above two situations.

Failed Node Processing

The flow of events are as follows:

1. Local peer node fails on the source cluster. The alive nodes report the last ship time from the dead node so that the alive node can resume shipping the (prole) records on its behalf:

Jun 14 2016 22:00:15 GMT: INFO (xdr): (xdr.c:4578) Node bb99358aa4f1006 failed at time 1465941915190 (2016-06-14 22:05:15.190 GMT). Its lastshiptime=1465941608716 (2016-06-14 22:00:08.716 GMT) for DC=REMOTE_DC_1
Jun 14 2016 22:00:15 GMT: WARNING (xdr): (xdr_serverside.c:285) XDR last ship time of this node for DC 0 went back to 1465941308716 from 1465941609863

2. Alive nodes begin the failed node processing:

Jun 14 2016 22:00:16 GMT: INFO (xdr): (xdr.c:3298) Performing: Failed node processing in window [1465941579646 (2016-06-14 21:59:39.646 GMT):1465941915190 (2016-06-14 22:05:15.190 GMT)] of node bb99358aa4f1006
Jun 14 2016 22:00:16 GMT: INFO (xdr): (xdr.c:3332) Failed node recovery for bb99358aa4f1006 pending, 5700 records left to check (at 0/s, ~0s left)
Jun 14 2016 22:00:17 GMT: INFO (xdr): (xdr.c:3427) process_failednode : Waiting to finish shipping of records already scheduled. Records queued=2797, finished=4
Jun 14 2016 22:00:18 GMT: INFO (xdr): (xdr.c:3427) process_failednode : Waiting to finish shipping of records already scheduled. Records queued=2797, finished=30
.
.
.
Jun 14 2016 22:00:45 GMT: INFO (xdr): (xdr.c:3427) process_failednode : Waiting to finish shipping of records already scheduled. Records queued=2797, finished=2784

3. Failed node processing is marked as completed once the records from the dead nodes last-shipped are all sent across to the destination cluster(s):

Jun 14 2016 22:00:47 GMT: INFO (xdr): (xdr_serverside.c:1000) XDR failed node processing completed for node bb99358aa4f1006
Jun 14 2016 22:00:47 GMT: INFO (xdr): (xdr.c:3456) Finished: Failed node processing in window [1465941308716 (2016-06-14 21:55:08.716 GMT):1465941915190 (2016-06-14 22:05:15.190 GMT)] of node bb99358aa4f1006. Shipped 2797 records

4. On bringing the stopped peer node back up, it will resume shipping from the digestlog it has and the last-ship time of the node is updated on its peers. It’s always recommended to do a clean restart if a node was down for an exceptionally long time to avoid stale data making its way into the destination:

Jun 14 2016 23:12:07 GMT: INFO (xdr): (xdr.c:4307) replication service ready: and now you have icing!
Jun 14 2016 23:12:08 GMT: WARNING (xdr): (xdr_serverside.c:285) XDR last ship time of this node for DC 0 went back to 1465945047644 from 1465945927448
Jun 14 2016 23:12:08 GMT: WARNING (xdr): (xdr_serverside.c:285) XDR last ship time of this node for DC 1 went back to 1465945047644 from 1465945927448

Window Shipping Processing

The flow of events are as follows:

1. Local cluster identifies a destination that is no longer reachable, marks the remote DC as CLUSTER_DOWN:

Jun 14 2016 23:00:40 GMT: INFO (xdr): (xdr.c:2844) Connection error when writing to cluster REMOTE_DC_2. Checking its health.
Jun 14 2016 23:00:40 GMT: INFO (xdr): (xdr.c:2873) Changing state of cluster REMOTE_DC_2 to CLUSTER_DOWN

2. Local cluster spawns a Windowshipper thread to save the last known timestamp where the remote destination received the records successfully:

Jun 14 2016 23:00:40 GMT: WARNING (xdr): (xdr.c:2903) Cluster REMOTE_DC_2 is down! Spawning a thread.
Jun 14 2016 23:00:41 GMT: INFO (xdr): (xdr.c:2294) Windowshipper: Added a new window from 1465944737502 (2016-06-14 22:52:17.502 GMT). Waiting for DC to be UP.
Jun 14 2016 23:00:41 GMT: INFO (xdr): (xdr.c:2296) Windowshipper: Cluster REMOTE_DC_2: Number of windows on list 1

3. Once the remote destination cluster is reachable again, XDR identifies the window shipping that is required:

Jun 14 2016 23:01:17 GMT: INFO (xdr): (xdr.c:2399) Windowshipper: Cluster REMOTE_DC_2 has come up : 0
Jun 14 2016 23:01:17 GMT: INFO (xdr): (xdr.c:2409) Windowshipper: Changing state of cluster REMOTE_DC_2 to CLUSTER_WINDOW_SHIP
Jun 14 2016 23:01:17 GMT: INFO (xdr): (xdr.c:2314) Setting end marker=1465945577009 (2016-06-14 23:06:17.009 GMT) for the window with start marker=1465944737502 (2016-06-14 22:52:17.502 GMT)
Jun 14 2016 23:01:17 GMT: INFO (xdr): (xdr.c:2431) Windowshipper: Cluster REMOTE_DC_2: For node id 0, start window shipping between 1465944737502 (2016-06-14 22:52:17.502 GMT) and 1465945577009 (2016-06-14 23:06:17.009 GMT)
Jun 14 2016 23:01:17 GMT: INFO (xdr): (xdr.c:2433) Windowshipper: Cluster REMOTE_DC_2: Number of windows on list 1

4. It begins shipping for that windows and identifies the time remaining as well:

Jun 14 2016 23:01:17 GMT: INFO (xdr): (xdr.c:2089) Start xdr_do_windowship
Jun 14 2016 23:01:17 GMT: INFO (xdr): (xdr.c:2103) Windowshipper: Start digest timestamp 1465945035212 (2016-06-14 22:57:15.212 GMT)
Jun 14 2016 23:01:17 GMT: INFO (xdr): (xdr.c:2148) Data center recovery for REMOTE_DC_2 pending, 18900 records left to check (at 0/s, ~0s left)
Jun 14 2016 23:01:17 GMT: INFO (xdr): (xdr.c:2844) Connection error when writing to cluster REMOTE_DC_2. Checking its health.

Jun 14 2016 23:01:27 GMT: INFO (xdr): (xdr.c:2148) Data center recovery for REMOTE_DC_2 pending, 17100 records left to check (at 176/s, ~1m37s left)

5. Destination cluster is set to CLUSTER_WINDOW_SHIP state during the process:

Jun 14 2016 23:01:28 GMT: INFO (xdr): (xdr.c:1995) [REMOTE_DC_2] CLUSTER_WINDOW_SHIP : timelag 533 secs : lst 1465944737502 (2016-06-14 22:52:17.502 GMT) : mlst 1465945085186 (2016-06-14 22:58:05.186 GMT) : fnlst 0 (-) : wslst 1465944737502 (2016-06-14 22:52:17.502 GMT) : shlat 0 ms

6. XDR continues shipping until it has processed all the records in the window:

Jun 14 2016 23:01:37 GMT: INFO (xdr): (xdr.c:2148) Data center recovery for REMOTE_DC_2 pending, 14200 records left to check (at 284/s, ~50s left)
.
.
Jun 14 2016 23:02:23 GMT: INFO (xdr): (xdr.c:2148) Data center recovery for REMOTE_DC_2 pending, 200 records left to check (at 235/s, ~0s left)
Jun 14 2016 23:02:24 GMT: INFO (xdr): (xdr.c:2240) Windowshipper : Waiting to finish shipping of records already scheduled. Records queued=10442, finished=6922
.
.
.
Jun 14 2016 23:02:41 GMT: INFO (xdr): (xdr.c:2240) Windowshipper : Waiting to finish shipping of records already scheduled. Records queued=10442, finished=10312
Jun 14 2016 23:02:42 GMT: INFO (xdr): (xdr.c:2240) Windowshipper : Waiting to finish shipping of records already scheduled. Records queued=10442, finished=10382

7. Once completed, local source cluster is notified with the completion of the window shipping and cluster state changes from CLUSTER_WINDOW_SHIP state to CLUSTER_UP:

Jun 14 2016 23:02:43 GMT: INFO (xdr): (xdr.c:2460) Windowshipper: Cluster REMOTE_DC_2: For node id 0, done window shipping between 1465944737502 (2016-06-14 22:52:17.502 GMT) and 1465945577009 (2016-06-14 23:06:17.009 GMT). Shipped 10442 records
Jun 14 2016 23:02:43 GMT: INFO (xdr): (xdr.c:2470) Windowshipper: Cluster REMOTE_DC_2: Number of windows on list 0
Jun 14 2016 23:02:43 GMT: INFO (xdr): (xdr.c:2474) Windowshipper: Cluster REMOTE_DC_2: Done with shipping of all windows. Shipped 10442 records
Jun 14 2016 23:02:43 GMT: INFO (xdr): (xdr.c:2512) Windowshipper: Changing state of cluster REMOTE_DC_2 to CLUSTER_UP

Applies To

Server prior to v. 5.0

Keywords

XDR FAILEDNODE WINDOWSHIP WINDOWSHIPPER LAG

Timestamp

September 2021

Topic		Replies	Views
About of aerospike XDR configurations XDR (Cross Data Center Replication) xdr	5	1370	July 20, 2021
XDR - Could not get host-information from host 127.0.0.1: port 3000 XDR (Cross Data Center Replication)	1	1624	November 20, 2015
XDR or Rack Awareness for geo cluster XDR (Cross Data Center Replication)	5	2105	February 14, 2019
About of aerospike XDR synchronize	2	1139	July 19, 2022
Monitoring XDR performance Monitoring	2	1562	July 17, 2017