All replicas gone on restart?


#1

Hi all,

I’m playing around with Aerospike to see how it behaves. It works well, but I don’t understand some syncing behavior. I’m running 3 (virtual) machines with debian, aerospike 3.4.1. I load some data in the database with the benchmark util, about a million keys, and keep an eye on things with the management console. All good so far.

Now I stop a machine. The other two take over the keys for the dead machine. They go from about 300K master and 300K replica objects each to about 500K master and 150K replica objects each. Still all good so far, since some replica keys will have become master keys, and they start syncing the missing keys.

But now I restart the machine I removed earlier. It comes back nicely, and all three machines have 300K master objects right away, but, and now here’s my question, the replica object counts have dropped to 0 on all three machines. They do start syncing, and everything goes back to normal after a while, but I don’t get why they go to 0. Is that expected?

This is the config:

namespace bench {
        replication-factor 2
        memory-size 1G
        default-ttl 0

        storage-engine device {
                file /var/local/aerospike/bench.dat
                filesize 1G
                data-in-memory true
        }
}

#2

Alicebob,

I am assuming that your source of information is AMC. I would not expect the number to be zero. Can you grab

asinfo -v 'namespace/bench' -l |grep objects

and

asinfo -v 'statistics' -l | grep partitions

output as well while experimenting.

Thanks

– Raj


#3

Hello Raj,

thanks for your quick reply. partitions stays empty, but objects gives this after a restart of 1 of the daemons:

root@debian-jessie:~/aerospike-server-community-3.4.1-debian7# asinfo -v namespace/bench -l|grep objects
objects=645332
sub-objects=0
master-objects=312162
master-sub-objects=0
prole-objects=6847
prole-sub-objects=0
expired-objects=0
evicted-objects=0
set-deleted-objects=0
set-evicted-objects=0
non-expirable-objects=0

prole-objects started as 0 after the restart and grows slowly to the expected, normal number (~300K). It’s the same on all three machines. That matches with what I see in AMC.

(asinfo on a remote host gives me trouble. If I do asinfo -h 192.168.2.77 -v namespace/bench -l I get request to 192.168.2.77 : 3000 returned error. Is the port number non-obvious?)


#4

alicebob,

3000 is default port.

what does this show ??

asinfo -v 'statistics' -l | grep partitions

– Raj


#5

Hello again Raj,

turns out to be partition, not partitions.

On a non-restarted machine, a little after the restarted daemon is back in the cluster and they are re-syncing:

root@debian-jessie:~# asinfo -v 'statistics' -l | grep partition
stat_cluster_key_partition_transaction_queue_count=0
partition_actual=4221
partition_replica=4106
partition_desync=1
partition_absent=3960
partition_object_count=656436
partition_ref_count=19858

Once it’s done syncing:

root@debian-jessie:~# asinfo -v 'statistics' -l | grep partition
stat_cluster_key_partition_transaction_queue_count=0
partition_actual=4221
partition_replica=4107
partition_desync=0
partition_absent=3960
partition_object_count=656436
partition_ref_count=12462

with partition_ref_count decreasing slowly.

Thanks for the help!


#6

Alicebob,

partition_actual is master

partition_replica is copies.

It being copy does not necessarily mean it is available on the designated replica hence prole_objects is zero. That said it sounds peculiar that it happened for all the partition.

I will check this. So what it means your copies are intact it is just they are not on the designated replica. Thanks for pointing this out.

– Raj