Aql client overwhelmed by "WARN AEROSPIKE_ERR_TIMEOUT"


#22

Looking at netstat options, in linux, I would try: $sudo netstat -tulpn | grep :6000


#23

On nodeB:

sudo netstat -tulpn | grep :6000 tcp 0 0 0.0.0.0:6000 0.0.0.0:* LISTEN 104042/asd

also:

netstat -anlp |grep 172.17.42 tcp 0 0 172.17.42.1:5000 10.168.17.12:32162 SYN_RECV -
tcp 0 0 172.17.42.1:5000 10.168.17.12:32947 SYN_RECV -
tcp 0 0 172.17.42.1:5000 10.168.17.12:32558 SYN_RECV -
tcp 0 0 172.17.42.1:6000 10.168.17.12:13871 SYN_RECV -
tcp 0 0 172.17.42.1:6000 10.168.17.12:13089 SYN_RECV -
tcp 0 0 172.17.42.1:6000 10.168.17.12:13475 SYN_RECV -
tcp 0 0 172.17.42.1:6000 10.168.17.12:14251 SYN_RECV -
tcp 0 1 ::ffff:172.17.42.1:33337 ::ffff:172.17.42.1:5000 SYN_SENT -
tcp 0 1 ::ffff:172.17.42.1:14640 ::ffff:172.17.42.1:6000 SYN_SENT -
udp 0 0 172.17.42.1:123 0.0.0.0:* -


#24

Just do sudo netstat -tulpn without the grep and see if you can find the PID associated with this rogue ip


#25

sudo netstat -tulpn > netstat.log grep “172.17.42.1” netstat.log udp 0 0 172.17.42.1:123 0.0.0.0:* 92374/ntpd


#26

What process is it that is showing the SYN_RECV state with this rogue ip? PID/Program Name column?


#27

It seems that no program name is displayed!

sudo netstat -anlp |grep 172.17.42 tcp 0 0 172.17.42.1:5000 10.168.17.12:58841 SYN_RECV -
tcp 0 0 172.17.42.1:5000 10.168.17.12:59256 SYN_RECV -
tcp 0 0 172.17.42.1:5000 10.168.17.12:58444 SYN_RECV -
tcp 0 0 172.17.42.1:5000 10.168.17.12:59664 SYN_RECV -
tcp 0 0 172.17.42.1:6000 10.168.17.12:40102 SYN_RECV -
tcp 0 0 172.17.42.1:6000 10.168.17.12:40936 SYN_RECV -
tcp 0 0 172.17.42.1:6000 10.168.17.12:39717 SYN_RECV -
tcp 0 0 172.17.42.1:6000 10.168.17.12:40515 SYN_RECV -
udp 0 0 172.17.42.1:123 0.0.0.0:* 92374/ntpd


#28

I recall that before the abnormal aql exceptions happended, I deploy a docker on nodeB.

was it possible to influenced the aerospike cluster?


#29

I don’t know docker but that IP address is tied to docker! Google it. I think it is somehow interfering with the aerospike cluster when it binds 172.17.42.1 to port 6000 - which I don’t quite know how it is doing because asd is listening on any/6000. May be related to aerospike’s “reuse-address” default true. Will research more and will let you know if I have other ideas.

Initial discussion in this link may help shut docker and clear the IP address. https://support.zenoss.com/hc/en-us/articles/203582809-How-to-Change-the-Default-Docker-Subnet


#30

Docker does create virtual ethernet interfaces, seems that is likely where nodeb learned the address. There isn’t a way to purge the advertised address without restarting the node. Obviously this assumes the interface is gone, run ip addr to verify.

If you do not need the clients to connect over both interfaces, consider setting the access-address configuration I mentioned, before restarting.


#31

@pgupta @kporter

OK, the problem is solved by just following the instructions in https://support.zenoss.com/hc/en-us/articles/203582809-How-to-Change-the-Default-Docker-Subnet

Only 3 commands:

iptables -t nat -F POSTROUTING ip link set dev docker0 down ip addr del 172.17.42.1/16 dev docker0

Thank you so much for your help!