Aerospike tries to connect to dead node

Hello. We are using Aerospike Community Edition build 5.6.0.7. We are running aerospike in cluster mode in AWS. When one of node is down we remove it via tip-clear command and it is working correctly. However I see in the logs that aerospike is still trying to connect to this node:

Jul 08 2021 12:53:09 GMT: WARNING (hb): (hb.c:4905) (repeated:20) could not create heartbeat connection to node - 172.31.69.252 {172.31.69.252:3003}
Jul 08 2021 12:53:09 GMT: WARNING (socket): (socket.c:869) (repeated:3) Error while connecting: 113 (No route to host)
Jul 08 2021 12:53:09 GMT: WARNING (socket): (socket.c:860) (repeated:17) Timeout while connecting
Jul 08 2021 12:53:09 GMT: WARNING (socket): (socket.c:928) (repeated:20) Error while connecting socket to 172.31.69.252:3003

I’ve read that it might be due to services-alumni, so I used services-alumni-clear command and afterwards I don’t see this ip in the list (neither services nor services alumni):

Admin+> asinfo -v 'services-alumni'
ip-172-31-72-75.ec2.internal:3001 (172.31.72.75) returned:
172.31.75.89:3001

172.31.75.89:3001 (172.31.75.89) returned:
172.31.72.75:3001

However according to logs other nodes are still trying to connect to this instance. How can it be prevented?

Can you please share your config and network interface setup (maybe the node is present as a seed node)?

Note: The way we add instance to cluster is to use ‘tip’ command.

# Aerospike database configuration file for deployments using mesh heartbeats.

service {
        user root
        group root
        paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
        service-threads 4
        proto-fd-max 15000
}

logging {
        # Log file must be an absolute path.
        file /var/log/aerospike/aerospike.log {
                context any info
        }
}

network {
        service {
                address any
                port 3001
                access-address 172.31.18.189 virtual
        }

        heartbeat {
                mode mesh
                port 3003 # Heartbeat port for this node.


                interval 250
                timeout 10
        }

        fabric {
                port 3002
        }

        info {
                port 3004
        }
}

You should be aware that the virtual keyword has been deprecated long time ago (I am surprised the server didn’t complain about it at startup but it should certainly be ignoring it in version since version 3.10 I believe). But I am not sure whether that would be causing the tip-clear to not work (also, that is on the service side and shouldn’t impact the heartbeat side).

I suggest maybe checking the server log file when the node joins the cluster (with the tip command) to make sure it is the right IP address that is being used to establish the heartbeat connection and that the same is then used for the tip-clear.

thanks for the awesome information.

© 2021 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.