The cluster stops working when adding a new node


#1

I have the cluster with 2 nodes (3.15.0.2 version) . I need to add 2 new nodes. I have installed 2 nodes (4.1.7 version). When I start new node reads and writes in AMC drops down to 0, my application writes error:

Timeout: iterations=0 lastNode=127.0.0.1:3000 in src/main/aerospike/as_event.c:467

Configs are same on all nodes:

service {
    paxos-single-replica-limit 1
    proto-fd-max 15000
}

logging {
    file /var/log/aerospike/aerospike.log {
        context any info
    }
}

network {
    service {
        address any
        port 3000
    }

    heartbeat {
        mode mesh
        address [IP of this node]
        port 3002

        mesh-seed-address-port [IP of this node] 3002
        mesh-seed-address-port [IP of node 2] 3002
        mesh-seed-address-port [IP of node 3] 3002
        mesh-seed-address-port [IP of node 4] 3002

        interval 150
        timeout 10
    }

    fabric {
        port 3001
    }

    info {
        port 3003
    }
}

namespace mem {
    replication-factor 2
    memory-size 48G
    default-ttl 1d

    storage-engine device {
        file /data/aerospike/mem.dat
        filesize 64G
        data-in-memory true
    }
}

All tcp communications are allowed between nodes… In logs I see new node added (NODE-ID bb9dec1da0e1b90 CLUSTER-SIZE 3), migration started. And only warnings about connection to node 4 (which is stopped).

Then I stop the node 3 and there are operations in AMC as if nothing had happened…

Why this can happen?


#2

You mentioned the application stops connecting, but only tcp connectivity works between nodes… Does tcp connectivity work between app and new nodes? Are you using the Aerospike logging interface? https://www.aerospike.com/docs/client/java/usage/logging.html What happens if you run ‘asadm -e health’ ?