All replicas gone on restart?

alicebob · February 2, 2015, 7:53pm

Hi all,

I’m playing around with Aerospike to see how it behaves. It works well, but I don’t understand some syncing behavior. I’m running 3 (virtual) machines with debian, aerospike 3.4.1. I load some data in the database with the benchmark util, about a million keys, and keep an eye on things with the management console. All good so far.

Now I stop a machine. The other two take over the keys for the dead machine. They go from about 300K master and 300K replica objects each to about 500K master and 150K replica objects each. Still all good so far, since some replica keys will have become master keys, and they start syncing the missing keys.

But now I restart the machine I removed earlier. It comes back nicely, and all three machines have 300K master objects right away, but, and now here’s my question, the replica object counts have dropped to 0 on all three machines. They do start syncing, and everything goes back to normal after a while, but I don’t get why they go to 0. Is that expected?

This is the config:

namespace bench {
        replication-factor 2
        memory-size 1G
        default-ttl 0

        storage-engine device {
                file /var/local/aerospike/bench.dat
                filesize 1G
                data-in-memory true
        }
}

raj · February 3, 2015, 8:06am

Alicebob,

I am assuming that your source of information is AMC. I would not expect the number to be zero. Can you grab

asinfo -v 'namespace/bench' -l |grep objects

and

asinfo -v 'statistics' -l | grep partitions

output as well while experimenting.

Thanks

– Raj

alicebob · February 3, 2015, 7:17pm

Hello Raj,

thanks for your quick reply. partitions stays empty, but objects gives this after a restart of 1 of the daemons:

root@debian-jessie:~/aerospike-server-community-3.4.1-debian7# asinfo -v namespace/bench -l|grep objects
objects=645332
sub-objects=0
master-objects=312162
master-sub-objects=0
prole-objects=6847
prole-sub-objects=0
expired-objects=0
evicted-objects=0
set-deleted-objects=0
set-evicted-objects=0
non-expirable-objects=0

prole-objects started as 0 after the restart and grows slowly to the expected, normal number (~300K). It’s the same on all three machines. That matches with what I see in AMC.

(asinfo on a remote host gives me trouble. If I do asinfo -h 192.168.2.77 -v namespace/bench -l I get request to 192.168.2.77 : 3000 returned error. Is the port number non-obvious?)

raj · February 4, 2015, 11:04am

alicebob,

3000 is default port.

what does this show ??

asinfo -v 'statistics' -l | grep partitions

– Raj

alicebob · February 4, 2015, 11:51am

Hello again Raj,

turns out to be partition, not partitions.

On a non-restarted machine, a little after the restarted daemon is back in the cluster and they are re-syncing:

root@debian-jessie:~# asinfo -v 'statistics' -l | grep partition
stat_cluster_key_partition_transaction_queue_count=0
partition_actual=4221
partition_replica=4106
partition_desync=1
partition_absent=3960
partition_object_count=656436
partition_ref_count=19858

Once it’s done syncing:

root@debian-jessie:~# asinfo -v 'statistics' -l | grep partition
stat_cluster_key_partition_transaction_queue_count=0
partition_actual=4221
partition_replica=4107
partition_desync=0
partition_absent=3960
partition_object_count=656436
partition_ref_count=12462

with partition_ref_count decreasing slowly.

Thanks for the help!

raj · February 6, 2015, 7:05am

Alicebob,

partition_actual is master

partition_replica is copies.

It being copy does not necessarily mean it is available on the designated replica hence prole_objects is zero. That said it sounds peculiar that it happened for all the partition.

I will check this. So what it means your copies are intact it is just they are not on the designated replica. Thanks for pointing this out.

– Raj

Topic		Replies	Views
Replicas invalidated after restart Upgrading	18	4102	October 13, 2016
Replication issue : all nodes down when synchronizing after a node restart Configuration	9	2357	November 22, 2016
Replica and master objects is inconsistent Migration	12	2567	June 13, 2017
Set Retention after Aerospike restart Deletion	2	1663	November 16, 2015
Odd record count when adding new nodes to cluster Operations	3	1312	July 4, 2016

All replicas gone on restart?

Related topics