Data loss possibility in case of cluster down with XDR

Hi, In a two cluster XDR setup with active (C1) & passive (C2), if C1 goes down then the clients (writes) would be moved to C2. C2 will now become active cluster. Data loss in this switch would be the writes not yet XDRed from C1 to C2 at the point C1 went down. I think same will cause data loss in active-active mode as well. Any workaround for this? Also, switching will have to be managed by client app?

Under normal circumstances, C1 shouldn’t have much data not shipped yet to C2 at any point of time. There may be some data not shipped when C1 goes down given the asynchronous nature of XDR shipping at this point. No easy workaround other than having clients have the ability to replay recent writes directly to C2 when C1 goes down. Regarding failover of clients from C1 to C2 will also have to be managed by the client app itself.

By the way, since you are running XDR, you would have an Enterprise license and would want to raise any issue / questions to support through the designated individuals on your side.

Thanks for the info @meher

Could possibly write a scan udf that touches all records that have LUT greater than a specific timestamp which will cause those records to reship. Possible??

@meher we thought aerospike gives a guarantee data will always be shipped. i.e. when C1 recovers it will start shipping the records. That there won’t be any data loss. Eventually it will ship all the records. Is this not the case and we anticipate data loss even when C1 is recovers.

@Atul_Dusane yes, thanks for stressing that point: when C1 comes back, it will start shipping records and will make sure every record/update is shipped at least once… My point was about just C1 going down, without going over what happens when C1 comes back, etc… .