Aerospike tries to connect to dead node

Vasilii_Parshkin · July 8, 2021, 12:56pm

Hello. We are using Aerospike Community Edition build 5.6.0.7. We are running aerospike in cluster mode in AWS. When one of node is down we remove it via tip-clear command and it is working correctly. However I see in the logs that aerospike is still trying to connect to this node:

Jul 08 2021 12:53:09 GMT: WARNING (hb): (hb.c:4905) (repeated:20) could not create heartbeat connection to node - 172.31.69.252 {172.31.69.252:3003}
Jul 08 2021 12:53:09 GMT: WARNING (socket): (socket.c:869) (repeated:3) Error while connecting: 113 (No route to host)
Jul 08 2021 12:53:09 GMT: WARNING (socket): (socket.c:860) (repeated:17) Timeout while connecting
Jul 08 2021 12:53:09 GMT: WARNING (socket): (socket.c:928) (repeated:20) Error while connecting socket to 172.31.69.252:3003

I’ve read that it might be due to services-alumni, so I used services-alumni-clear command and afterwards I don’t see this ip in the list (neither services nor services alumni):

Admin+> asinfo -v 'services-alumni'
ip-172-31-72-75.ec2.internal:3001 (172.31.72.75) returned:
172.31.75.89:3001

172.31.75.89:3001 (172.31.75.89) returned:
172.31.72.75:3001

However according to logs other nodes are still trying to connect to this instance. How can it be prevented?

neelp · July 12, 2021, 5:30am

Can you please share your config and network interface setup (maybe the node is present as a seed node)?

Vasilii_Parshkin · July 12, 2021, 9:52am

Note: The way we add instance to cluster is to use ‘tip’ command.

# Aerospike database configuration file for deployments using mesh heartbeats.

service {
        user root
        group root
        paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
        service-threads 4
        proto-fd-max 15000
}

logging {
        # Log file must be an absolute path.
        file /var/log/aerospike/aerospike.log {
                context any info
        }
}

network {
        service {
                address any
                port 3001
                access-address 172.31.18.189 virtual
        }

        heartbeat {
                mode mesh
                port 3003 # Heartbeat port for this node.


                interval 250
                timeout 10
        }

        fabric {
                port 3002
        }

        info {
                port 3004
        }
}

meher · July 12, 2021, 11:25pm

You should be aware that the virtual keyword has been deprecated long time ago (I am surprised the server didn’t complain about it at startup but it should certainly be ignoring it in version since version 3.10 I believe). But I am not sure whether that would be causing the tip-clear to not work (also, that is on the service side and shouldn’t impact the heartbeat side).

I suggest maybe checking the server log file when the node joins the cluster (with the tip command) to make sure it is the right IP address that is being used to establish the heartbeat connection and that the same is then used for the tip-clear.

jackyjoy123 · August 19, 2021, 12:11pm

thanks for the awesome information.

jackyjoy123 · March 10, 2022, 9:29am

jackyjoy123:

meher:

You should be aware that the virtual keyword has been deprecated long time ago (I am surprised the server didn’t complain about it at startup but it should certainly be ignoring it in version since version 3.10 I believe). But I am not sure whether that would be causing the tip-clear to not work (also, that is on the service side and shouldn’t impact the heartbeat side).

I suggest maybe checking the server log file when the node joins the cluster (with the tip command) to make sure it is the right IP address that is being used to establish the heartbeat connection and that the same is then used for the tip-clear . https://krogerfeedback.nl https://talktosonic.onl https://talktowendys.vip https://whataburgersurvey.onl

thanks for the awesome information.

thanks my issue has been fixed.

Topic		Replies	Views
Aerospike cluster on docker Installation	3	851	August 19, 2022
Node.js client upgrade from 1.0 to 3.5 Node.js Client	6	1542	August 21, 2018
Why is the aerospike cluster config is not working Installation	2	685	August 14, 2022
Error in delay connects after system ready message Operations benchmark	11	3866	October 13, 2015
Adding 2 nodes to the cluster and how to check whether 2 nodes are connected or not?	36	8196	March 10, 2017

Aerospike tries to connect to dead node

Related topics