Why replica count might appear to drop suddenly during migrations


#1

##Why replica record count might appear to drop suddenly during migrations

Issue: During migration replica record count appears to change when node is viewed via:

asadm -e info

###Detail

When nodes are migrating partitions, the replica record count drops, sometimes severely, an extreme example is shown below.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Node    Namespace   Avail%   Evictions    Master    Replica     Repl     Stop    Migrates       Disk    Disk     HWM          Mem     Mem    HWM      Stop   
           .            .        .           .   Objects    Objects   Factor   Writes     (tx,rx)       Used   Used%   Disk%         Used   Used%   Mem%   Writes%   
aero-10:3000   persistent       32   206131141   1.754 G    0.000          2   false    (N/E,N/E)   1.841 TB      48      75   210.635 GB      61     65        90   
aero-11:3000   persistent       37           0   1.646 G    0.000          2   false    (N/E,N/E)   1.724 TB      45      75   197.253 GB      57     65        90   
aero-12:3000   persistent       35           0   1.821 G    0.000          2   false    (N/E,N/E)   1.767 TB      46      75   202.210 GB      58     65        90   

The overall number of records, shown in the statistic record_refs will reflect the correct value (correct number of master records + correct number of prole records). Record_refs is defined as the number of index records currently referenced, the total number of records across the cluster.

###Explanation

This is expected behaviour. The objects of desynchronised replica partitions are not counted until they become synchronised. This means that if a partition is not the master or acting master and has an inbound migration, the objects in that partition will not be counted, and so we may see a sharp drop in replica record numbers. This can be particularly obvious when a specific node has a high number of outbound migrations. In that scenario the other nodes in the cluster would have a high number of scheduled inbound migrations and the replica objects would not be counted.

This is normal behaviour and as long as record_refs reflects the total number of master and replica records in the cluster there is no issue. The replica record count per node will stabilise once migrations are completed.


Replicas invalidated after restart