One of Three Nodes went down abruptly and writes are not happening


#1

We are running 3 Node Cluster, data in memory on version 4.2.0.4 CE. We recently noticed writes are not happening and found one down. Ideally write should happen. Once we start the node which was down, the writes resumed.

Found below INFO Logs being printed continuosly on two nodes.
INFO (hb): (hb.c:4319) found redundant connections to same node, fds 101 31 - choosing at random

On the other node, no logs being printed and no read/writes happening on adadm stats. Also we have observed that the records are unevenly distributed across the nodes.

Please help.


#2

check if the other two nodes are publishing a private ip address not accessible to client and only one node (that went down) is publishing an accessible ip address. (network stanza, service sub-context)


#3

Gupta, Thanks for your apt reply. Yes this is what is happening. But in the configuration file I have provided the public IP addresses. These three nodes are on AWS.

Please help.


#4

Discussion continuing here: https://stackoverflow.com/questions/54326222/aerospike-one-of-three-nodes-went-down-abruptly-and-writes-are-not-happening