DC states during boot time and run time of XDR

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

Synopsis:

Destination DC states during XDR boot up time and run time.

Abstract:

This article explains XDR’s behaviour in different cases involving single or multiple destination DCs. This article assumes that you have basic knowledge of XDR. Behaviors will be different during boot up time and run time.

Case DC1 Reachability DC2 Reachability DC3 Reachability DC4 Reachability Config Shipping dc_state Digest Logging
1 Partial/All None Partial/All Boot time None of the DCs DC1=CLUSTER_UP, DC2=CLUSTER_INACTIVE, DC3=CLUSTER_INACTIVE Yes
2 Partial/All Partial/All Partial/All Boot time Ship to all DCs DC1,DC2,DC3=CLUSTER_UP Yes
3 Partial/All Partial/All Partial/All Run time Ship to all DCs DC1,DC2,DC3=CLUSTER_UP Yes
4 Add new DC4 Partial/All Partial/All Partial/All Partial/All Run time Ship to all DCs DC1,DC2,DC3,DC4=CLUSTER_UP Yes
5 Add new DC4 Partial/All Partial/All Partial/All None Run time Ship to available DCs DC1,DC2,DC3=CLUSTER_UP, DC4=CLUSTER_INACTIVE Yes
6 Add DC4 while DC1 link down None Partial/All Partial/All Partial/All Run time Ship to available DCs DC2,DC3,DC4=CLUSTER_UP, DC1=CLUSTER_DOWN Yes
7 Add DC4 while DC1 link down None Partial/All Partial/All None Run time Ship to available DCs DC2,DC3=CLUSTER_UP, DC1=CLUSTER_DOWN, DC4=CLUSTER_INACTIVE Yes

Case 1: One complete destination DC2 is not reachable during boot time.

During startup if one of the destination DC2 is not discoverable, then the XDR would not start shipping data to any of the destination DCs, but it will be logging digests into the digest log. When ever all the destination DCs are discoverable it will start shipping to all the DCs.

During boot time XDR follows a sequence hence if DC1 is reachable and DC2 is not then DC1s dc_state will be CLUSTER_UP but the subsequent DCs will be CLUSTER_INACTIVE.

Case 2: All or some of the nodes in all destination DCs are reachable during Boot time.

XDR is starting for the first time on the source and all the 3 destination DCs are added. All/Few nodes in all the 3 destination DCs are discoverable by the source. The dc_state will be CLUSTER_UP for all the 3 destination DCs. The digest log entries are written into the digest log and records are shipped to the destination DCs.

Note: For destination nodes which are not reachable from the source, the XDR client writes would be proxied through another node at the destination cluster.

Case 3: Only some of the nodes in any destination DCs are reachable Run time:

Once XDR has started shipping to all the DCs, even if it loses connectivity to few nodes in a particular DC, the dc_state will still be CLUSTER_UP.

Case 4: Adding a new DC4 dynamically during run time:

If we are adding a new DC4 dynamically and if all or even some of the nodes are reachable then the dc_state for DC4 and others will be CLUSTER_UP.

Case 5: Adding a new DC4 dynamically during run time and none of the nodes in DC4 are reachable:

If we are adding a new DC4 dynamically and if NONE of the nodes in DC4 are reachable then dc_state will be CLUSTER_INACTIVE for DC4.

DC1, DC2, and DC3 will be CLUSTER_UP.

Case 6: Add DC4 while DC1 link down:

If a new DC, DC4 is dynamically added and if the link is down for DC1 then the dc_states will be CLUSTER_UP for DC2, DC3 and DC4.

DC1’s dc_state = CLUSTER_DOWN.

Case 7: Add DC4 while DC1 link down and none of the nodes in DC4 are reachable:

If we are adding a new DC4 dynamically and if NONE of the nodes in DC4 are reachable, and DC1 link is down then the dc_states will be DC2,DC3=CLUSTER_UP, DC1=CLUSTER_DOWN, DC4=CLUSTER_INACTIVE.

XDR will be successfully shipping to the 2 active DCs and we keep a track of digests which are not written to the down DC1.

Once the connectivity to the destination DC1 is restored, the dc_state on the source for DC2 will go from CLUSTER_DOWN to CLUSTER_WINDOWSHIP.

In CLUSTER_WINDOWSHIP state, the records which were not shipped while the dc_state was CLUSTER_DOWN will be shipped. This is taken care by a separate thread, the new writes which are written through the client will keep getting shipped parallely. Once all the records (which were not shipped during CLUSTER_DOWN) are shipped then the dc_state will go to CLUSTER_UP.

DC2 dc_state = CLUSTER_UP → CLUSTER_DOWN → CLUSTER_WINDOWHIP → CLUSTER_UP

Keywords

XDR dc_state CLUSTER_UP CLUSTER_INACTIVE