How to remove an offline node in Aerospike? (AER-2757) [Released] [Resolved]


I have lot of offline node. How to remove them from the cluster ? I see old nodes when i do “i net” in asadm.

cluster dun does not work.



Admin> i net
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Node               Node                Fqdn                  Ip   Client     Current     HB          HB   
           .                 Id                   .                   .    Conns        Time   Self     Foreign   000000000000000      N/E         N/E    N/E         N/E   000000000000000      N/E         N/E    N/E         N/E   BB960C588902500    17395   174076893      0   143043402

Currently you have to do a rolling restart of the cluster to reset the services-alumni state. The ticket JIRA tracking this issue is AER-2757. I see that it is code complete, pending review at the moment.

When released, you should be able to issue the following command to clear this state:

asinfo -v "services-alumni-reset"

We have experienced similar situation, our cluster (+10 nodes) has experienced/reported serious cluster visibility issues when one of the nodes became unavailable (offline).

We have replication set to 2 so you could expect that it should not be an issue.

Is these days any way to manually let Aerospike know about situation where one of the nodes went down and should be removed from the cluster?

@luk The way to do that will be the “services-alumni-reset” command mentioned above.

1 Like

What version is this expected to be in?

It will be the next release which should be 3.6.2.

1 Like

@bpaquet, @luk and @naoum,

Good news! We just released Aerospike Server Community Edition 3.6.2, which fixes AER-2757. You can read its release notes and download it here.

Please upgrade to this new version and let us know whether this fixes your issue.

5 posts were split to a new topic: How to remove offline node from Heartbeat’s seed list?

Where do we need to run this command. On the server where AMC is installed, On one of the nodes in the cluster, Or on all the nodes in the cluster?

For the purpose of not reporting departed peers to tools such as AMC and asadm you would need to run service-alumni-reset on each node.

You can run across the cluster via asadm with:

asadm -h "ip of node in cluster" -e "asinfo -v 'service-alumni-reset`"