Why replica count might appear to drop suddenly during migrations

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

This behavior is not expected for server versions 3.13 and above that are on the new clustering protocol (paxos-protocol v5) which enhances the clustering algorithm by not dropping the replica partitions until it is synchronized.

Why replica record count might appear to drop suddenly during migrations

Issue: During migration replica record count appears to change when node is viewed via:

asadm -e info

Detail

When nodes are migrating partitions, the replica record count drops, sometimes severely, an extreme example is shown below.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Node    Namespace   Avail%   Evictions    Master    Replica     Repl     Stop    Migrates       Disk    Disk     HWM          Mem     Mem    HWM      Stop   
           .            .        .           .   Objects    Objects   Factor   Writes     (tx,rx)       Used   Used%   Disk%         Used   Used%   Mem%   Writes%   
aero-10:3000   persistent       32   206131141   1.754 G    0.000          2   false    (N/E,N/E)   1.841 TB      48      75   210.635 GB      61     65        90   
aero-11:3000   persistent       37           0   1.646 G    0.000          2   false    (N/E,N/E)   1.724 TB      45      75   197.253 GB      57     65        90   
aero-12:3000   persistent       35           0   1.821 G    0.000          2   false    (N/E,N/E)   1.767 TB      46      75   202.210 GB      58     65        90   

Explanation

This is expected behaviour. The objects of desynchronised replica partitions are not counted until they become synchronised. This means that if a partition is not the master or acting master and has an inbound migration, the objects in that partition will not be counted, and so we may see a sharp drop in replica record numbers. This can be particularly obvious when a specific node has a high number of outbound migrations. In that scenario the other nodes in the cluster would have a high number of scheduled inbound migrations and the replica objects would not be counted.

This is normal behaviour. The replica record count per node will stabilise once migrations are completed.

1 Like