Aerospike migrations issue/ data loss

Hi

We have a 10 node cluster with version 3.9 running with cold-start-empty false, in which we did the following activity :

  1. Added a node 10.0.29.212 with version community build 3.13.0.10
  2. Waited for migrations to finish (new cluster size 11). There were incoming Migrations on only 10.0.29.212 node as expected.
  3. Added 2 nodes 10.0.29.190 , 10.0.29.135 simultaneously with version community build 3.13.0.10.
  4. Waited for migrations to finish (new cluster size 13).Incoming Migrations on only these two nodes node as expected.
  5. Added a node 10.0.29.214 after few hours with version community build 3.13.0.10.
  6. Immediately after the node was added , the total master objects in the cluster dropped and incoming migrations started on all nodes and we started getting timeouts on cluster.

Cross posted to Stack Overflow: https://stackoverflow.com/questions/56905315/aerospike-migrations-issue.

2 Likes

Could you provide the output from this command:

asadm -e "show config diff"

Also provide the /etc/aerospike.conf file from one of your nodes.

Also do you have the server logs from the time of this event?

Could you also share charts for:

  1. objects
  2. master-objects
  3. prole-objects
  4. non-replica-objects

OUTPUT OF : “show config diff”

Service Configuration~~~~~~~~~~~~
NODE 10.0.23.103 10.0.23.142 10.0.23.169 10.0.23.171 10.0.23.185 10.0.23.62 10.0.23.71 10.0.23.76 10.0.23.8 10.0.29.135 10.0.29.190 10.0.29.212 10.0.29.214
allow-inline-transactions true true true true true true true true true N/E N/E N/E N/E
enable-xdr false false false false false false false false false N/E N/E N/E N/E
fabric-workers 16 16 16 16 16 16 16 16 16 N/E N/E N/E N/E
fabric.address N/E N/E N/E N/E N/E N/E N/E N/E N/E any any any any
fabric.channel-bulk-fds N/E N/E N/E N/E N/E N/E N/E N/E N/E 2 2 2 2
fabric.channel-bulk-recv-threads N/E N/E N/E N/E N/E N/E N/E N/E N/E 4 4 4 4
fabric.channel-ctrl-fds N/E N/E N/E N/E N/E N/E N/E N/E N/E 1 1 1 1
fabric.channel-ctrl-recv-threads N/E N/E N/E N/E N/E N/E N/E N/E N/E 4 4 4 4
fabric.channel-meta-fds N/E N/E N/E N/E N/E N/E N/E N/E N/E 1 1 1 1
fabric.channel-meta-recv-threads N/E N/E N/E N/E N/E N/E N/E N/E N/E 4 4 4 4
fabric.channel-rw-fds N/E N/E N/E N/E N/E N/E N/E N/E N/E 8 8 8 8
fabric.channel-rw-recv-threads N/E N/E N/E N/E N/E N/E N/E N/E N/E 16 16 16 16
fabric.latency-max-ms 0 0 0 0 0 0 0 0 0 5 5 5 5
fabric.recv-rearm-threshold N/E N/E N/E N/E N/E N/E N/E N/E N/E 1024 1024 1024 1024
fabric.send-threads N/E N/E N/E N/E N/E N/E N/E N/E N/E 8 8 8 8
forward-xdr-writes false false false false false false false false false N/E N/E N/E N/E
heartbeat.address N/E N/E N/E N/E N/E N/E N/E N/E N/E any any any any
heartbeat.addresses 10.0.23.103:3002 10.0.23.142:3002 10.0.23.169:3002 10.0.23.171:3002 10.0.23.185:3002 10.0.23.62:3002 10.0.23.71:3002 10.0.23.76:3002 10.0.23.8:3002 N/E N/E N/E N/E
heartbeat.fabric-grace-factor -1 -1 -1 -1 -1 -1 -1 -1 -1 N/E N/E N/E N/E
heartbeat.mesh-seed-address-port 10.0.23.142:3002,10.0.23.169:3002,10.0.23.171:3002,10.0.23.185:3002,10.0.23.237:3002,10.0.23.62:3002,10.0.23.71:3002,10.0.23.76:3002,10.0.23.8:3002 10.0.23.164:3002,10.0.23.185:3002,10.0.23.187:3002,10.0.23.192:3002,10.0.23.89:3002,10.0.23.94:3002,10.0.23.95:3002 10.0.23.142:3002,10.0.23.164:3002,10.0.23.185:3002,10.0.23.187:3002,10.0.23.192:3002,10.0.23.89:3002,10.0.23.94:3002,10.0.23.95:3002 10.0.23.142:3002,10.0.23.164:3002,10.0.23.169:3002,10.0.23.185:3002,10.0.23.187:3002,10.0.23.192:3002,10.0.23.71:3002,10.0.23.89:3002,10.0.23.94:3002,10.0.23.95:3002 10.0.23.142:3002,10.0.23.164:3002,10.0.23.187:3002,10.0.23.192:3002,10.0.23.89:3002,10.0.23.94:3002,10.0.23.95:3002 10.0.23.142:3002,10.0.23.164:3002,10.0.23.169:3002,10.0.23.171:3002,10.0.23.185:3002,10.0.23.71:3002,10.0.23.89:3002 10.0.23.142:3002,10.0.23.164:3002,10.0.23.169:3002,10.0.23.185:3002,10.0.23.187:3002,10.0.23.192:3002,10.0.23.89:3002,10.0.23.94:3002,10.0.23.95:3002 10.0.23.103:3002,10.0.23.142:3002,10.0.23.169:3002,10.0.23.171:3002,10.0.23.185:3002,10.0.23.237:3002,10.0.23.62:3002,10.0.23.71:3002,10.0.23.8:3002 10.0.23.142:3002,10.0.23.164:3002,10.0.23.169:3002,10.0.23.171:3002,10.0.23.185:3002,10.0.23.62:3002,10.0.23.71:3002,10.0.23.89:3002 10.0.23.142:3002,10.0.23.185:3002,10.0.29.190:3002,10.0.29.212:3002,10.0.29.214:3002,10.0.29.251:3002 10.0.23.142:3002,10.0.23.185:3002,10.0.29.135:3002,10.0.29.212:3002,10.0.29.214:3002,10.0.29.251:3002 10.0.23.142:3002,10.0.23.185:3002,10.0.29.135:3002,10.0.29.190:3002,10.0.29.214:3002,10.0.29.251:3002 10.0.23.142:3002,10.0.23.185:3002,10.0.29.135:3002,10.0.29.190:3002,10.0.29.212:3002,10.0.29.251:3002
heartbeat.port N/E N/E N/E N/E N/E N/E N/E N/E N/E 3002 3002 3002 3002
info.address N/E N/E N/E N/E N/E N/E N/E N/E N/E any any any any
memory-accounting false false false false false false false false false N/E N/E N/E N/E
migrate-max-num-incoming 256 256 256 256 256 256 256 256 256 4 4 4 4
migrate-rx-lifetime-ms 60000 60000 60000 60000 60000 60000 60000 60000 60000 N/E N/E N/E N/E
min-cluster-size N/E N/E N/E N/E N/E N/E N/E N/E N/E 1 1 1 1
paxos-recovery-policy auto-reset-master auto-reset-master auto-reset-master auto-reset-master auto-reset-master auto-reset-master auto-reset-master auto-reset-master auto-reset-master N/E N/E N/E N/E
service.access-port N/E N/E N/E N/E N/E N/E N/E N/E N/E 0 0 0 0
service.alternate-access-port N/E N/E N/E N/E N/E N/E N/E N/E N/E 0 0 0 0
service.alternate-port 0 0 0 0 0 0 0 0 0 N/E N/E N/E N/E
service.alternate-tls-port 0 0 0 0 0 0 0 0 0 N/E N/E N/E N/E
service.tls-access-port N/E N/E N/E N/E N/E N/E N/E N/E N/E 0 0 0 0
service.tls-alternate-access-port N/E N/E N/E N/E N/E N/E N/E N/E N/E 0 0 0 0
sindex-data-max-memory ULONG_MAX ULONG_MAX ULONG_MAX ULONG_MAX ULONG_MAX ULONG_MAX ULONG_MAX ULONG_MAX ULONG_MAX N/E N/E N/E N/E
use-queue-per-device false false false false false false false false false N/E N/E N/E N/E
xdr-client-threads 3 3 3 3 3 3 3 3 3 N/E N/E N/E N/E
xdr-compression-threshold 0 0 0 0 0 0 0 0 0 N/E N/E N/E N/E
xdr-delete-shipping-enabled true true true true true true true true true N/E N/E N/E N/E
xdr-digestlog-path NULL NULL NULL NULL NULL NULL NULL NULL NULL N/E N/E N/E N/E
xdr-digestlog-size 0 0 0 0 0 0 0 0 0 N/E N/E N/E N/E
xdr-hotkey-time-ms 100 100 100 100 100 100 100 100 100 N/E N/E N/E N/E
xdr-info-timeout 10000 10000 10000 10000 10000 10000 10000 10000 10000 N/E N/E N/E N/E
xdr-max-ship-bandwidth 0 0 0 0 0 0 0 0 0 N/E N/E N/E N/E
xdr-max-ship-throughput 0 0 0 0 0 0 0 0 0 N/E N/E N/E N/E
xdr-nsup-deletes-enabled false false false false false false false false false N/E N/E N/E N/E
xdr-read-threads 4 4 4 4 4 4 4 4 4 N/E N/E N/E N/E
xdr-ship-bins false false false false false false false false false N/E N/E N/E N/E
xdr-shipping-enabled true true true true true true true true true N/E N/E N/E N/E
xdr-write-timeout 10000 10000 10000 10000 10000 10000 10000 10000 10000 N/E N/E N/E N/E
Network Configuration~~~~~~~~~~~~
NODE 10.0.23.103 10.0.23.142 10.0.23.169 10.0.23.171 10.0.23.185 10.0.23.62 10.0.23.71 10.0.23.76 10.0.23.8 10.0.29.135 10.0.29.190 10.0.29.212 10.0.29.214
fabric.address N/E N/E N/E N/E N/E N/E N/E N/E N/E any any any any
fabric.channel-bulk-fds N/E N/E N/E N/E N/E N/E N/E N/E N/E 2 2 2 2
fabric.channel-bulk-recv-threads N/E N/E N/E N/E N/E N/E N/E N/E N/E 4 4 4 4
fabric.channel-ctrl-fds N/E N/E N/E N/E N/E N/E N/E N/E N/E 1 1 1 1
fabric.channel-ctrl-recv-threads N/E N/E N/E N/E N/E N/E N/E N/E N/E 4 4 4 4
fabric.channel-meta-fds N/E N/E N/E N/E N/E N/E N/E N/E N/E 1 1 1 1
fabric.channel-meta-recv-threads N/E N/E N/E N/E N/E N/E N/E N/E N/E 4 4 4 4
fabric.channel-rw-fds N/E N/E N/E N/E N/E N/E N/E N/E N/E 8 8 8 8
fabric.channel-rw-recv-threads N/E N/E N/E N/E N/E N/E N/E N/E N/E 16 16 16 16
fabric.latency-max-ms 0 0 0 0 0 0 0 0 0 5 5 5 5
fabric.recv-rearm-threshold N/E N/E N/E N/E N/E N/E N/E N/E N/E 1024 1024 1024 1024
fabric.send-threads N/E N/E N/E N/E N/E N/E N/E N/E N/E 8 8 8 8
heartbeat.address N/E N/E N/E N/E N/E N/E N/E N/E N/E any any any any
heartbeat.addresses 10.0.23.103:3002 10.0.23.142:3002 10.0.23.169:3002 10.0.23.171:3002 10.0.23.185:3002 10.0.23.62:3002 10.0.23.71:3002 10.0.23.76:3002 10.0.23.8:3002 N/E N/E N/E N/E
heartbeat.fabric-grace-factor -1 -1 -1 -1 -1 -1 -1 -1 -1 N/E N/E N/E N/E
heartbeat.mesh-seed-address-port 10.0.23.142:3002,10.0.23.169:3002,10.0.23.171:3002,10.0.23.185:3002,10.0.23.237:3002,10.0.23.62:3002,10.0.23.71:3002,10.0.23.76:3002,10.0.23.8:3002 10.0.23.164:3002,10.0.23.185:3002,10.0.23.187:3002,10.0.23.192:3002,10.0.23.89:3002,10.0.23.94:3002,10.0.23.95:3002 10.0.23.142:3002,10.0.23.164:3002,10.0.23.185:3002,10.0.23.187:3002,10.0.23.192:3002,10.0.23.89:3002,10.0.23.94:3002,10.0.23.95:3002 10.0.23.142:3002,10.0.23.164:3002,10.0.23.169:3002,10.0.23.185:3002,10.0.23.187:3002,10.0.23.192:3002,10.0.23.71:3002,10.0.23.89:3002,10.0.23.94:3002,10.0.23.95:3002 10.0.23.142:3002,10.0.23.164:3002,10.0.23.187:3002,10.0.23.192:3002,10.0.23.89:3002,10.0.23.94:3002,10.0.23.95:3002 10.0.23.142:3002,10.0.23.164:3002,10.0.23.169:3002,10.0.23.171:3002,10.0.23.185:3002,10.0.23.71:3002,10.0.23.89:3002 10.0.23.142:3002,10.0.23.164:3002,10.0.23.169:3002,10.0.23.185:3002,10.0.23.187:3002,10.0.23.192:3002,10.0.23.89:3002,10.0.23.94:3002,10.0.23.95:3002 10.0.23.103:3002,10.0.23.142:3002,10.0.23.169:3002,10.0.23.171:3002,10.0.23.185:3002,10.0.23.237:3002,10.0.23.62:3002,10.0.23.71:3002,10.0.23.8:3002 10.0.23.142:3002,10.0.23.164:3002,10.0.23.169:3002,10.0.23.171:3002,10.0.23.185:3002,10.0.23.62:3002,10.0.23.71:3002,10.0.23.89:3002 10.0.23.142:3002,10.0.23.185:3002,10.0.29.190:3002,10.0.29.212:3002,10.0.29.214:3002,10.0.29.251:3002 10.0.23.142:3002,10.0.23.185:3002,10.0.29.135:3002,10.0.29.212:3002,10.0.29.214:3002,10.0.29.251:3002 10.0.23.142:3002,10.0.23.185:3002,10.0.29.135:3002,10.0.29.190:3002,10.0.29.214:3002,10.0.29.251:3002 10.0.23.142:3002,10.0.23.185:3002,10.0.29.135:3002,10.0.29.190:3002,10.0.29.212:3002,10.0.29.251:3002
heartbeat.port N/E N/E N/E N/E N/E N/E N/E N/E N/E 3002 3002 3002 3002
info.address N/E N/E N/E N/E N/E N/E N/E N/E N/E any any any any
service.access-port N/E N/E N/E N/E N/E N/E N/E N/E N/E 0 0 0 0
service.alternate-access-port N/E N/E N/E N/E N/E N/E N/E N/E N/E 0 0 0 0
service.alternate-port 0 0 0 0 0 0 0 0 0 N/E N/E N/E N/E
service.alternate-tls-port 0 0 0 0 0 0 0 0 0 N/E N/E N/E N/E
service.tls-access-port N/E N/E N/E N/E N/E N/E N/E N/E N/E 0 0 0 0
service.tls-alternate-access-port N/E N/E N/E N/E N/E N/E N/E N/E N/E 0 0 0 0
userData Namespace Configuration~~~~~~~~~~~~
NODE 10.0.23.103 10.0.23.142 10.0.23.169 10.0.23.171 10.0.23.185 10.0.23.62 10.0.23.71 10.0.23.76 10.0.23.8 10.0.29.135 10.0.29.190 10.0.29.212 10.0.29.214
migrate-retransmit-ms N/E N/E N/E N/E N/E N/E N/E N/E N/E 1000 1000 1000 1000
obj-size-hist-max N/E N/E N/E N/E N/E N/E N/E N/E N/E 100 100 100 100
partition-tree-locks N/E N/E N/E N/E N/E N/E N/E N/E N/E 8 8 8 8
partition-tree-sprigs N/E N/E N/E N/E N/E N/E N/E N/E N/E 64 64 64 64
rack-id N/E N/E N/E N/E N/E N/E N/E N/E N/E 0 0 0 0
sindex.data-max-memory ULONG_MAX ULONG_MAX ULONG_MAX ULONG_MAX ULONG_MAX ULONG_MAX ULONG_MAX ULONG_MAX ULONG_MAX N/E N/E N/E N/E
storage-engine.cold-start-empty false false false false true false false false false false false false false
storage-engine.enable-benchmarks-storage false true true true true true true false true false false false false
user_config_data Namespace Configuration~~~~~~~~~~~~
NODE 10.0.23.103 10.0.23.142 10.0.23.169 10.0.23.171 10.0.23.185 10.0.23.62 10.0.23.71 10.0.23.76 10.0.23.8 10.0.29.135 10.0.29.190 10.0.29.212 10.0.29.214
migrate-retransmit-ms N/E N/E N/E N/E N/E N/E N/E N/E N/E 5000 5000 5000 5000
obj-size-hist-max N/E N/E N/E N/E N/E N/E N/E N/E N/E 100 100 100 100
partition-tree-locks N/E N/E N/E N/E N/E N/E N/E N/E N/E 8 8 8 8
partition-tree-sprigs N/E N/E N/E N/E N/E N/E N/E N/E N/E 64 64 64 64
rack-id N/E N/E N/E N/E N/E N/E N/E N/E N/E 0 0 0 0
sindex.data-max-memory ULONG_MAX ULONG_MAX ULONG_MAX ULONG_MAX ULONG_MAX ULONG_MAX ULONG_MAX ULONG_MAX ULONG_MAX N/E N/E N/E N/E

/etc/aerospike.conf

# Aerospike database configuration file for deployments using mesh heartbeats.

service {

        user root
        group root
        paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
        pidfile /var/run/aerospike/asd.pid
        
        ## SET TO NUMBER OF CORES ##
        service-threads 4  
        transaction-queues 4
        scan-threads 4
        ###########################
        
        ## DONT CHANGE ##
        transaction-threads-per-queue 3
        proto-fd-idle-ms 600000
        proto-fd-max 100000
        batch-max-requests 10000
        migrate-threads 1
        replication-fire-and-forget true
        ##########################
}

logging {

        file /var/log/aerospike/aerospike.log {
                context any info
        }
}

network {

        service {
                address any
                port 3000
        }

        heartbeat {
                mode mesh
                port 3002

                mesh-seed-address-port 10.0.23.142 3002
                mesh-seed-address-port 10.0.23.185 3002
                mesh-seed-address-port 10.0.29.212 3002
                mesh-seed-address-port 10.0.29.190 3002
                mesh-seed-address-port 10.0.29.135 3002
                mesh-seed-address-port 10.0.29.214 3002
                mesh-seed-address-port 10.0.29.251 3002
                mesh-seed-address-port 10.0.29.190 3002
                mesh-seed-address-port 10.0.29.135 3002

                interval 150
                timeout 20
}

        fabric {
                port 3001
        }

        info {
                port 3003
        }
}

namespace userdata {

        replication-factor 2
        memory-size 30G
        default-ttl 0 
        storage-engine device {
                cold-start-empty false
                device  /dev/nvme0n1
                write-block-size 1024K
                max-write-cache 1024M
        }
        migrate-sleep 100
}
namespace user_config_data {

        replication-factor 2
        memory-size 1G
        default-ttl 0
        storage-engine device {
                cold-start-empty false
                file /dev/aerospike/user_config_data.dat
                filesize 1G
                write-block-size 1024K
        }
}

PFA images attached

One strange observation is seen on the replica counts after addition of fourth new version node 10.0.29.214. The nodes in subnet 10.0.29.XXX ip address having Aerospike Community Edition build 3.13.0.10 had their replica counts almost same while for nodes in subnet 10.0.23.XXX having version Aerospike Enterprise Edition build 3.9.1-158-g1e8db6e, replica count dropped to almost zero for few nodes. This can be observed clearly in below graph

PS : The nodes in subnet 10.0.29.XXX ip address having Aerospike Community Edition build 3.13.0.10 have partition placement groups with different partitions(racks) while others are in cluster/spread placement group

It is possible that the two newly added nodes formed a cluster before joining the remaining nodes - this would cause them to ‘own’ a copy of all partitions which would need to be resolved via migrations. However, there are heuristics that make this scenario unlikely - though these heuristics require the paxos v5 change.

Output is noisy because the nodes aren’t on the same version - thought I had read you had done the paxos protocol switch so seeing this was useful.

This is due to a change in the way ‘prole-objects’ is computed. Prior to 3.13, a node receiving migrations didn’t report its objects as master or prole.

Regarding the title,

Aerospike migrations issue/ data loss

I’m not seeing any evidence of data-loss. Likely if sub-clusters formed you would have gotten not-found errors from the new nodes hat would eventually be taking over as master (which you may have interpreted as data-loss?). The sub-cluster caused them to take over immediately. Enabling read-consistency-level-override all would have allowed the clients to continue finding the records but it can add latency during migrations.

I suggest fully upgrading to 3.13 so you can do the paxos protocol switch procedure. The new protocols would have likely prevented this scenario.

1 Like

As mentioned above the steps taken :

  1. Added a node 10.0.29.212 with version community build 3.13.0.10
  2. Waited for migrations to finish (new cluster size 11). There were incoming Migrations on only 10.0.29.212 node as expected.
  3. Added 2 nodes 10.0.29.190 , 10.0.29.135 simultaneously with version community build 3.13.0.10.
  4. Waited for migrations to finish (new cluster size 13).Incoming Migrations on only these two nodes node as expected.
  5. Added a node 10.0.29.214 after few hours with version community build 3.13.0.10.
  6. Immediately after the node was added , the total master objects in the cluster dropped and incoming migrations started on all nodes and we started getting timeouts on cluster.

Debugging further i went through the logs and find out that after step 4 (cluster size 13), the moment we added node 10.0.29.214, we expected the cluster size to be 14. What happened was in logs of 6 nodes we could see the cluster size dropping from 13 to 6, then 11 and then 14 finally with in a second. On the remaining 8 nodes the cluster size dropped from 13 to 11 and then 14.

Please see the logs for node where size dropped to 6 follwed by 11 and then 14

Jul 05 2019 04:53:01 GMT: IN strong text FO (hb): (hb.c:9665) node arrived bb9fef8c95b8d06

Jul 05 2019 04:53:01 GMT: INFO (fabric): (fabric.c:2421) fabric: node bb9fef8c95b8d06 arrived

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2254) Skip node arrival bb9fef8c95b8d06 cluster principal bb9e244f471cf06 pulse principal bb9fef8c95b8d06

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:259) DISALLOW MIGRATIONS

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:432) cluster key set to 5fe908f890b4

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:3139) SUCCESSION [1562302381]@bb9fef8c95b8d06*: bb9fef8c95b8d06 bb9e244f471cf06 bb93632e5e0a606 bb9347b9af89f06 bb90c21b623ec06 bb902d507623f06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:3150) node bb9fef8c95b8d06 is now principal pro tempore

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2343) Sent partition sync request to node bb9fef8c95b8d06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:3243) received partition sync message from bb9fef8c95b8d06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2656) CLUSTER SIZE = 6

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:1211) {userdata} replication factor is 2

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:2568) {userdata} re-balanced, expected migrations - (440 tx, 301 rx)

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:2572) {userdata} fresh-partitions 1471

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:1211) {user_config_data} replication factor is 2

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:2568) {user_config_data} re-balanced, expected migrations - (440 tx, 301 rx)

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:2572) {user_config_data} fresh-partitions 1471

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:245) ALLOW MIGRATIONS

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2865) {1562302381} sending prepare_ack to bb9fef8c95b8d06

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:259) DISALLOW MIGRATIONS

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:432) cluster key set to e1eb5468cca7

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:3139) SUCCESSION [1562302382]@bb9fef8c95b8d06*: bb9fef8c95b8d06 bb9e244f471cf06 bb9e03d704b2f06 bb992977e164a06 bb9843f865e7f06 bb976057c354f06 bb95098458cde06 bb93632e5e0a606 bb9347b9af89f06 bb90c21b623ec06 bb902d507623f06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:3150) node bb9fef8c95b8d06 is now principal pro tempore

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2343) Sent partition sync request to node bb9fef8c95b8d06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:3243) received partition sync message from bb9fef8c95b8d06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2656) CLUSTER SIZE = 11

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:1211) {userdata} replication factor is 2

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:2568) {userdata} re-balanced, expected migrations - (653 tx, 379 rx)

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:1211) {user_config_data} replication factor is 2

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:2568) {user_config_data} re-balanced, expected migrations - (653 tx, 379 rx)

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:245) ALLOW MIGRATIONS Jul 05 2019 04:53:01 GMT: INFO (hb): (hb.c:2668) marking node add for paxos recovery: bb9260a743a1906

Jul 05 2019 04:53:01 GMT: INFO (hb): (hb.c:2668) marking node add for paxos recovery: bb92cff9e8f4006

Jul 05 2019 04:53:01 GMT: INFO (hb): (hb.c:2668) marking node add for paxos recovery: bb9147134689606

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2561) Cluster Integrity Check: Detected succession list discrepancy between node bb95098458cde06 and self bb9347b9af89f06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:260) Paxos List [bb9fef8c95b8d06,bb9e244f471cf06,bb9e03d704b2f06,bb992977e164a06,bb9843f865e7f06,bb976057c354f06,bb95098458cde06,bb93632e5e0a606,bb9347b9af89f06,bb90c21b623ec06,bb902d507623f06]

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:260) Node List [bb9e244f471cf06,bb9e03d704b2f06,bb992977e164a06,bb9843f865e7f06,bb976057c354f06,bb95098458cde06,bb93632e5e0a606,bb9347b9af89f06,bb92cff9e8f4006,bb9260a743a1906,bb9147134689606,bb90c21b623ec06,bb902d507623f06]

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2561) Cluster Integrity Check: Detected succession list discrepancy between node bb93632e5e0a606 and self bb9347b9af89f06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:260) Paxos List [bb9fef8c95b8d06,bb9e244f471cf06,bb9e03d704b2f06,bb992977e164a06,bb9843f865e7f06,bb976057c354f06,bb95098458cde06,bb93632e5e0a606,bb9347b9af89f06,bb90c21b623ec06,bb902d507623f06]

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:260) Node List [bb9fef8c95b8d06,bb9e244f471cf06,bb93632e5e0a606,bb9347b9af89f06,bb90c21b623ec06,bb902d507623f06]

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2561) Cluster Integrity Check: Detected succession list discrepancy between node bb902d507623f06 and self bb9347b9af89f06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:260) Paxos List [bb9fef8c95b8d06,bb9e244f471cf06,bb9e03d704b2f06,bb992977e164a06,bb9843f865e7f06,bb976057c354f06,bb95098458cde06,bb93632e5e0a606,bb9347b9af89f06,bb90c21b623ec06,bb902d507623f06]

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:260) Node List [bb9fef8c95b8d06,bb9e244f471cf06,bb93632e5e0a606,bb9347b9af89f06,bb90c21b623ec06,bb902d507623f06]

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2424) Corrective changes: 3. Integrity fault: true

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2467) Marking node add for paxos recovery: bb9fef8c95b8d06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2467) Marking node add for paxos recovery: bb9e244f471cf06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2467) Marking node add for paxos recovery: bb9e03d704b2f06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2467) Marking node add for paxos recovery: bb992977e164a06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2467) Marking node add for paxos recovery: bb9843f865e7f06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2467) Marking node add for paxos recovery: bb976057c354f06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2467) Marking node add for paxos recovery: bb95098458cde06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2467) Marking node add for paxos recovery: bb93632e5e0a606

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2467) Marking node add for paxos recovery: bb9347b9af89f06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2467) Marking node add for paxos recovery: bb90c21b623ec06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2467) Marking node add for paxos recovery: bb902d507623f06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2490) Skipping paxos recovery: bb9fef8c95b8d06 will handle the recovery

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2865) {1562302382} sending prepare_ack to bb9fef8c95b8d06

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:259) DISALLOW MIGRATIONS

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:432) cluster key set to e068a1496c6c

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:3139) SUCCESSION [1562302383]@bb9fef8c95b8d06*: bb9fef8c95b8d06 bb9e244f471cf06 bb9e03d704b2f06 bb992977e164a06 bb9843f865e7f06 bb976057c354f06 bb95098458cde06 bb93632e5e0a606 bb9347b9af89f06 bb92cff9e8f4006 bb9260a743a1906 bb9147134689606 bb90c21b623ec06 bb902d507623f06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:3150) node bb9fef8c95b8d06 is now principal pro tempore

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2343) Sent partition sync request to node bb9fef8c95b8d06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:3243) received partition sync message from bb9fef8c95b8d06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2656) CLUSTER SIZE = 14

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:1211) {userdata} replication factor is 2

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:2568) {userdata} re-balanced, expected migrations - (657 tx, 391 rx)

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:1211) {user_config_data} replication factor is 2

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:2568) {user_config_data} re-balanced, expected migrations - (661 tx, 391 rx)

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:245) ALLOW MIGRATIONS

Jul 05 2019 04:53:02 GMT: WARNING (rw): (write.c:1795) {userdata} write_master: failed as_bin_cdt_alloc_modify_from_client() :0x1de9b38038df3b41d9505990d0f43579e4708837


Please see the logs for node where cluster size dropped to 11 and then 14 Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:137) DISALLOW MIGRATIONS

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:153) cluster_key set to 0xe1eb5468cca7

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:3160) SUCCESSION [1562302382]@bb9fef8c95b8d06*: bb9fef8c95b8d06 bb9e244f471cf06 bb9e03d704b2f06 bb992977e164a06 bb9843f865e7f06 bb976057c354f06 bb95098458cde06 bb93632e5e0a606 bb9347b9af89f06 bb90c21b623ec06 bb902d507623f06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:3171) node bb9fef8c95b8d06 is now principal pro tempore

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2385) Sent partition sync request to node bb9fef8c95b8d06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:3261) received partition sync message from bb9fef8c95b8d06

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:343) CLUSTER SIZE = 11

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:1032) setting replication factors: cluster size 11, paxos single replica limit 1

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:1047) {userdata} replication factor is 2

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:1047) {user_config_data} replication factor is 2

Jul 05 2019 04:53:01 GMT: INFO (config): (cluster_config.c:421) rack aware is disabled

Jul 05 2019 04:53:01 GMT: INFO (partition): (cluster_config.c:380) rack aware is disabled

Jul 05 2019 04:53:01 GMT: INFO (fabric): (fabric.c:1685) fabric: node bb9fef8c95b8d06 arrived

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:476) {userdata} re-balanced, expected migrations - (554 tx, 760 rx)

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:476) {user_config_data} re-balanced, expected migrations - (554 tx, 760 rx)

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:129) ALLOW MIGRATIONS

Jul 05 2019 04:53:01 GMT: WARNING (smd): (system_metadata.c:1939) failed to get metadata operation from failed transaction response msg (err -2 ; fabric err 0)

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2886) {1562302382} sending prepare_ack to bb9fef8c95b8d06

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:137) DISALLOW MIGRATIONS

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:153) cluster_key set to 0xe068a1496c6c

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:3160) SUCCESSION [1562302383]@bb9fef8c95b8d06*: bb9fef8c95b8d06 bb9e244f471cf06 bb9e03d704b2f06 bb992977e164a06 bb9843f865e7f06 bb976057c354f06 bb95098458cde06 bb93632e5e0a606 bb9347b9af89f06 bb92cff9e8f4006 bb9260a743a1906 bb9147134689606 bb90c21b623ec06 bb902d507623f06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:3171) node bb9fef8c95b8d06 is now principal pro tempore

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:2385) Sent partition sync request to node bb9fef8c95b8d06

Jul 05 2019 04:53:01 GMT: INFO (paxos): (paxos.c:3261) received partition sync message from bb9fef8c95b8d06

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:343) CLUSTER SIZE = 14

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:1032) setting replication factors: cluster size 14, paxos single replica limit 1

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:1047) {userdata} replication factor is 2

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:1047) {user_config_data} replication factor is 2

Jul 05 2019 04:53:01 GMT: INFO (config): (cluster_config.c:421) rack aware is disabled

Jul 05 2019 04:53:01 GMT: INFO (partition): (cluster_config.c:380) rack aware is disabled

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:476) {userdata} re-balanced, expected migrations - (530 tx, 669 rx)

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:476) {user_config_data} re-balanced, expected migrations - (530 tx, 670 rx)

Jul 05 2019 04:53:01 GMT: INFO (partition): (partition_balance.c:129) ALLOW MIGRATIONS


**Node Info **

Node NodeId Version
10-0-23-103 BB95098458CDE06 E-3.9.1-158-g1e8db6e
10-0-23-142 BB9E244F471CF06 E-3.9.1-158-g1e8db6e
10-0-23-169 BB9E03D704B2F06 E-3.9.1-158-g1e8db6e
10-0-23-171 BB92CFF9E8F4006 E-3.9.1-158-g1e8db6e
10-0-23-185 BB902D507623F06 E-3.9.1-158-g1e8db6e
10-0-23-62 BB9147134689606 E-3.9.1-158-g1e8db6e
10-0-23-71 BB9843F865E7F06 E-3.9.1-158-g1e8db6e
10-0-23-76 BB9260A743A1906 E-3.9.1-158-g1e8db6e
10-0-23-8 BB992977E164A06 E-3.9.1-158-g1e8db6e
10-0-29-135 BB90C21B623EC06 C-3.13.0.10
10-0-29-190 BB93632E5E0A606 C-3.13.0.10
10-0-29-212 BB9347B9AF89F06 C-3.13.0.10
10-0-29-214 *BB9FEF8C95B8D06 C-3.13.0.10

Any updates ? Still waiting for response

Our Enterprise customer support team have very short SLAs guarantees, if you need faster responses I recommend reaching out to our Sales Department about an Enterprise contract as our Community forum doesn’t come with an SLA guarantee.

These logs confirm that the cluster didn’t transition to the final cluster size smoothly, which I suspected and mentioned earlier. The paxos v5 protocols reduce the odds of this happening. This scenario would have caused a lot of migrations. Between Enterprise versioned nodes the migrations would have been much faster since they has the ‘Rapid Rebalance’ feature. But between Enterprise and Community or Community and Community, migrations will have to ship every record (often in both directions).

Are you able to provide a graph showing the total ‘objects’ across all nodes during this period, or a stacked graph of the ‘objects’ metric. Looking at the graphs you have provided, I do not see anything unexpected for the version you are on.

1 Like

PFA the objects distribution

Now as next step we are planning to upgrade full cluster (of 10 nodes) to version to 3.13.0.10 (new protocol). With the current situation and running cluster in production (13 nodes with 4 nodes of v3.13 and 9 nodes of v3.9 ) and given that it is clear from logs that cluster didn’t transition to the final cluster size smoothly when we added a node of v3.13, what do you suggest, how should we proceed ?

Are you able to sum the objects across all nodes with your monitoring tool. At a glance, the graph appears normal. New nodes took over a set of objects which reduces the number of objects on other nodes. You had indicated ‘data loss’ I still haven’t seen anything that would indicate this.

To reduce the risk of forming intermediate clusters in the old clustering upgrade one node at a time and wait for migrations to complete before upgrading the next - as you’ve already done and unfortunately the older protocols still reformed to a smaller cluster.