Replica and master objects is inconsistent

RockyTu · June 7, 2017, 10:20am

The environment is Aerospike CE 3.9.1. the replica factor is set to 2. When the migration completes, I still find the replica and master object number is not equal. During the migration and after the migration, still have some other client write data into Aerospike. I also find new arrived data seems commit to replica object, and the diff number between replica and master is the same. I didn’t find any relevant error message in log file. So, is there any unexpected behavior ? And how can I do next ?

Albot · June 7, 2017, 9:13pm

AMC can show inaccurate stats sometimes. Check through asadm. You can also try restarting amc after migrations are done to see if it looks different. I have a cron job that restarts AMC every day

pgupta · June 7, 2017, 9:25pm

What is your AMC version? - It shows right at the bottom of the browser pane - footer. Also, on your node, what does the query below show?

$grep objects /var/log/aerospike/aerospike.log

You should see something like this:

INFO (info): (ticker.c:348) {ns1} objects: all 300000 master 300000 prole 0

(My output is with a one node cluster - so prole is zero.) Do you see difference with AMC or is it same?

RockyTu · June 8, 2017, 3:38am

in asadm is the same.

RockyTu · June 8, 2017, 3:44am

I have 6 nodes cluster, replica is 2.

one node: Jun 08 2017 03:24:06 GMT: INFO (info): (ticker.c:328) {production} objects: all 7693601 master 3919389 prole 3774212

another node: Jun 08 2017 03:41:55 GMT: INFO (info): (ticker.c:328) {production} objects: all 8406338 master 6005579 prole 2399283

in AMC, the number is the same, at least, the diff is around 40M records.

pgupta · June 8, 2017, 4:09am

AMC version?

RockyTu · June 8, 2017, 4:13am

AMC CE 4.0.12, the latest one

pgupta · June 8, 2017, 4:18am

i would expect all = master + replica on each node. sum (all) = sum (master) + sum (replica) … sum over all nodes sum(master) = sum (replica) … sum over all nodes

pgupta · June 8, 2017, 4:18am

since your numbers from logs and amc match, amc is not the issue then.

Albot · June 9, 2017, 10:48pm

This is after migrations are finished? Anything else showing in your log? How about asadm -e “show stats like err”?

RockyTu · June 10, 2017, 2:57am

Yes, the migration is finished, it is strange, no relative error, and then I rolling restart all nodes, the issue disappeared, I suspect 3.9.1 is not a stable version, because the new heartbeat sub system released and some issues fixed on next version.

Albot · June 10, 2017, 8:24pm

I don’t see anything in release notes which addresses anything like that. I’m not really sure whats going on. At this point it may be worth opening a case with Aerospike if you have a support contract

kporter · June 13, 2017, 1:47am

We did discover a way this can happen when recovering from certain split-brain conditions while working on the partition rebalance algorithm used with paxos-protocol v5. Basically it was possible for a partition recovering from a split-brain to determine there were no migrations needed. We were unable to address this issue in the old algorithm and it shouldn’t exist in the new rebalance algorithm. To switch to paxos-protocol v5, see http://www.aerospike.com/docs/operations/upgrade/cluster_to_3_13.

Topic		Replies	Views
Replicas invalidated after restart Upgrading	18	4103	October 13, 2016
Aerospike Partitions Migration Internals	3	665	July 25, 2019
All replicas gone on restart?	5	1707	February 6, 2015
Aerospike Replica.RANDOM giving inconsistent result while migrations are in progress query	4	3001	October 5, 2015
Aerospike migrations issue/ data loss Migration query	12	1795	July 15, 2019

Replica and master objects is inconsistent

Related topics