LUT, LST, asrestore and largeish namespaces


I think this thread is related: How to retain LUT when using asbackup and asrestore?

I’m using Aerospike 4.9. In this scenario we have 2 datacenters. A and B.

A has all of the data, B has some of the data due to XDR being enabled bidirectionally at setup.

I ran asbackup of a namespace that came out to 14TB. It took over 24 hours to backup and over 24 to restore to datacenter B.

We have a bin that is always only ever incremented by design. Folks on our team noticed that some numbers appeared to decrement when compared against some other meta data we track.

My theory is that during restore to datacenter B, the LUT of the records being restored was set to “now”, which was probably newer than the LST for that records partition and thus those records then got shipped back to datacenter A (where the record was updated in datacenter A somewhere in that 48 hours).

Does this seem reasonable?

How does a person prevent or mitigate this?


1 Like

I believe your theory is reasonable.

asrestore has an option to skip restoring a record if it exists already… (refer to the -u option – the first one on the asrestore write-policy-options doc) that would make sure to only restore records that haven’t made it to B. You may want to disable XDR temporarily while doing this to avoid race conditions with records that may be ‘in flight’. I think those options and details are discussed on this article: How to Migrate Aerospike from one Cluster to Another.