Why do I see this error in AQL for a new node that I recently added even though my cluster is running fine?
2019-09-12 17:31:51 WARN Failed to connect to peer 10.xxx.yy.zz 3000. AEROSPIKE_ERR_CONNECTION Socket write error: 111 from 10.xxx.yy.zz:3000
If AQL works from cluster node B to cluster nodes A and B, but doesn’t work from cluster node A to any other cluster node, it could be a firewall issue (iptables/routing/security-group if AWS or other cloud environment, physical firewall, etc). Aerospike cluster nodes communicate on ports 3001 and 3002 only (by default), so port 3000 between them is not required, which means the cluster would work just fine if port 3000 is not open between the cluster nodes. Refer to Network configuration for details on Aerospike default ports.
Here are some troubleshooting steps to help narrow down:
When trying to connect from a node to itself (A to A) try both external and internal (localhost) IPs.
You can try a telnet test to port 3000 between the nodes, trying different combinations (including from a jump server outside the cluster).
If telnet works in most cases but fails in some, it is most likely a firewall issue.