Odd record count when adding new nodes to cluster


#1

I have an Aerospike Cluster v3.8.4 - I don’t understand how the numbers of master / replica objects works.

My 7 nodes cluster had 18M master objects and 36M replicas (replication factor of 3).

When I just added 6 new nodes to the cluster, the number of master objects dropped to 16M and replica objects dropped to 1M. I noticed from the application that no objects were actually lost, and the cluster is increasing the number of objects over time, without any write to it, which I suppose it is re-creating the master and replicas.

Why this is so confusing?


#2

Object counts during migrations is a common source of confusion. Basically the counts are underestimates during migration. Partitions that haven’t reached their final state are not counted

The “object-refs” (iirc the name) counts the number of object references held, the index always holds one for each object and various transactions will also take a ref. This can be used as an over estimate (especially during migration).


#3

What do you mean by “object-refs”? What is “iirc the name”? Is there any way to know exactly how many objects I have in the database? Is there any chance to loose when I add new nodes? I noticed rare events of NullPointerException in my application, which suggests some old existing objects may have disappeared.


#4

Looks like I didn’t recall correctly, the correct metric name is record-refs.

Is there any way to know exactly how many objects I have in the database?

If there were an efficient way to compute the exact number during rebalance, we would simply provide that instead of an approximate value.

Is there any chance to loose when I add new nodes?

Data loss is not expected when adding nodes.

I noticed rare events of NullPointerException in my application, which suggests some old existing objects may have disappeared.

The server does provides error codes, cannot really determine much from a NullPointerException. Could these be timeouts? Rebalance will cause some additional latency to transactions.