Are XDR replicated records returned in queries?

I have a ‘Simple Active-Active’ setup of 2 aerospike clusters with each cluster setup to XDR replicate its records to the other cluster.

  1. I need only the counts of the master records(non-XDR replicated records from each cluster). Can I do that using the java client api/ udf aggregation?

2.This is the reverse of Q1: can I fetch the counts of the master records as well as the XDR replicated records from the cluster?

A2. Let me answer your Q2 first as its simple and straight forward. In the logs we continuously print the master objected count for each namespace. The same info can be obtained via namespace stats too (statname : master_objects). You just need to sum up those values across all the nodes in the cluster.

A1. We do not maintain information if a record is written by XDR or by the client. But we have namespace stats which tell number of writes by clients+xdr (client_write_success) and number of writes by XDR alone (xdr_write_success). Note that this does not represent the object count. there can be 1000 writes but only 1 object. I am not sure if the stat will serve your purpose.

Thanks Sunil - that was a crisp answer. For A2, can I fetch those stats using the java API?

Yes, you can use the Info API in Java. The command string is "namespace/<nsname>". Replace <nsname> with actual namespace name. You can request() from one/multiple/all nodes of cluster.

If you want this only for information purpose, you can use our tools to get this information. In general, be advised that its not a good idea to integrate these stats in your core application logic. It is rare, but we may change stat names or their semantics over time. Moreover, your application logic should not depend on the stats.

Sorry, but I come back with another related problem. My problem is no longer limited to simply counting the number of records.

I have a common back-end application node that queries both the clusters in my ‘Simple Active-Active’ setup and then combines the results and sends back combined results back to the API client. The problem, as you would know, is that I get two copies of the same record.

Q1) Is there any way in which I can have my queries on each cluster consider only the master records on that cluster? Does Aerospike give me any customization hook just before XDR copy at the source cluster or before writing at the destination cluster so that I can set a bin in my replicated record(s) to distinguish it from the master records. I can then filter on this bin to return only the master records.

Secondly, I am curious - for the Simple Active-Active XDR replication setup:

Q2). If, on a cluster, Aerospike does not differentiate between the master record and a record written by client, then is it possible to update the XDR copy of a record on the XDR destination cluster?

Q3) I think it is possible to have sets with same names and within same named namespaces in both the clusters. If it is so, is it possible that 2 records in the same set, one in each cluster, have the same PK? If that’s also true, what will happen when each of these 2 records are replicated on the other cluster… Is there a possibility of a clash of PK?

I am not sure why your application is reading replicated data from 2 clusters and merging the data. In general, this idea does not look good to me. The application should know which data is replicated and which one is not. I can understand if you are merging unreplicated data. In XDR you can configure only to replicate some sets and not others. So, your application can organise data in sets and deal with replicated vs non-replicated sets differently. I am not fully aware of requirements, but I feel that it’s worth relooking at your design. You should look at XDR as a data sync mechanism rather than master and slave/replica copies.

Giving answers specific to your questions.

Q1. No hooks are provided to manipulate data before replication.

Q2. Yes, its like any normal record. Your application can read/update the record on destination.

Q3. Yes. We call it write-write conflict. In an active-active setup, if the same record is updated in both the clusters at the same time, this conflict may arise. We do not resolve the situation automatically. Depending on the timing, the last record version to be shipped will survive on both nodes or two different copies may be shipped to each other. The common work around is to aovid write-write conflict from application layer by having key affinity to a cluster.

Its a PoC kind of an application right now. I will design the real application in a way that is consistent with this discussion. Thanks for your inputs

This topic was automatically closed 6 days after the last reply. New replies are no longer allowed.