Setting XDR shipping pointer for a particular datacenter in digestlog


#1

Hi

I am shipping my records to multiple datacenters. However, I have a situation where I want to have my shipping pointer set to current time in digestlog for a particular datacenter. Is this possible and if yes, how?


#2

If you explain your situation, we may be able to suggest a solution. But to answer your specific question, we do not let users manipulate the shipping pointer.


#3

We are looking for disaster recovery scenarios. We have multiple datacenters which are shipping to multiple datacenters. Now, at any point of time we may have a situation that one of the datacenters have failed due to disaster at one site. We do not want digestlog to be overloaded for this datacenter. We will recovery at the disaster site using other db backup/restore means. Now, the source site need not send all the updates using window shipping mechanism to this particular datacenter and should send the data only after a certain point when backup of source DB was done.We want to handle such situations by putting a pointer in digestlog for a particular datacenter so that xdr replication resumes from there.


#4

Ironically, that is one of the main purpose of XDR. If a link is down, it should hold the log and wait for the link to come up and catchup. You need not do backup/restore. But seems you are worried about digestlog getting filled up. First thing you should do is size the digestlog. Our recommendation is 100G. It can hold upto 3 days at 5000 write tps. If you monitor your link status and address your link issue quickly, this capacity is more than enough and you can exploit XDR.

Did you encounter a situation where the above behavior of XDR is not good enough for you or ths is more of a theoretical concern at this point ?

However, if you really really dont want XDR to handle the link down and you want to take care of it yourself (via backup/restore), there is a way to do it. Its a backdoor mechanism and I do not want to discuss in a public forum. Please contact via our support channel for this. We can discuss there.


#5

Hi Sunil,

We will anyways take it thru support channel but for the time I would like to understand few things here:

  1. As you mentioned that your recommendation in 100G, which is sufficient to handle 3 days data @ 5000tps, if we reach such a situation where digest log size touches 100G, then about how much time will it take to ship the actual data using XDR mechanism to other site, here actual data will be much more than 100G as digest log only stores the key.

  2. If 100G is the max limit or we can configure even higher, is there any recommended limit which is max and what is the time required to ship that data?

  3. We were talking of setting XDR shipping pointer, on what basis it shall be done, some timestamp value or something else and this pointer will be set in digest log (right?)?

  4. In case we landup in a disaster recovery scenario for a prolonged period, then wont it be a good idea to take a backup of the whole namespaces from good site and restore it on bad site and at the same time clear the digestlogs, because data would anyway be restored using backup, if we do this then the data which need to replicated using XDR mechanism would be very less (only from the point when backup was taken till it is restored) as most of it will be handled thru backup/restore. If it looks a viable option, then I believe that there must be some mechanism to clear the digest logs.

Regards, Pankaj Jain.


#6

1 - time for xdr to ship: will depend on underlying record size (total data to ship), hot-key-duplication ie record update pattern - same record getting updated in quick succession will be de-duplicated, throughput of the xdr link

2 - max digestlog file size is determined by you, based on what your filesystem can accommodate and has adequate space for, 100G to 300G is typical

3 - you cannot set/manipulate xdr shipping pointer


#7

Hi P Gupta,

I agree that it will depend on size of actual data, but do you have statistics which gives us some information that how much real data can be shipped in how much of time. You mentioned that we cannot set/manipulate xdr shipping pointer, but at the same time sunil mentioned that probably it can be done and we shall come thru support channel for the same. FYI, we contacted thru support channel and it was informed that probably it can be achieved by disassociating namespaces for a particular data center and then associating it again but then it is not working as expected. So we want to be very clear whether it is achievable by any means or not. Moreover can you please let us know how can we contact local support in India, as of now our team in Canada interact with their local counterpart. Regards, Pankaj Jain.


#8

@pajain14 - First, I am just sharing what I know, not advising on behalf of Aerospike Support. For EE customers, they must have access to Support per terms of their contract. With that said … :slight_smile:

The throughput you will achieve will depend on your link bandwidth. In other work I have done with TCP/IP client-server data transfer, I have found typically you should be able to achieve 60% to 80% bandwidth utilization. XDR should be similar TCP/IP data transfer. So based on GB of data you have to transfer and GB/sec of bandwidth, you can estimate time.

Hot Keys De-duplication built in XDR - If all you updated is a few records repeatedly (smaller data transfer) vs every update was a different record (large data transfer) - that will have significant impact on the total data size estimate that has to be transferred by XDR - so that totally depends on your write-update pattern. You have to estimate that.

Backdoor commands? None that I know of that are not documented on the public website. What you are probably looking for is a recipe to achieve what you want to.

You are trying to disable XDR and pick it back up using asbackup/restore and yet (I assume) maintain consistency. That is very hard to think through without understanding all the details of your XDR implementation - the overall topology and combination scenarios - like another link going down at a later time while one is down etc.

IMHO, size your digestlog correctly and don’t do this circuitous route.