Why do I see this error in AQL for a new node that I recently added even though my cluster is running fine?

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

Why do I see this error in AQL for a new node that I recently added even though my cluster is running fine?

 2019-09-12 17:31:51 WARN Failed to connect to peer 10.xxx.yy.zz 3000. AEROSPIKE_ERR_CONNECTION Socket write error: 111 from 10.xxx.yy.zz:3000

If AQL works from cluster node B to cluster nodes A and B, but doesn’t work from cluster node A to any other cluster node, it could be a firewall issue (iptables/routing/security-group if AWS or other cloud environment, physical firewall, etc). Aerospike cluster nodes communicate on ports 3001 and 3002 only (by default), so port 3000 between them is not required, which means the cluster would work just fine if port 3000 is not open between the cluster nodes. Refer to Network configuration for details on Aerospike default ports.

Here are some troubleshooting steps to help narrow down:

  1. When trying to connect from a node to itself (A to A) try both external and internal (localhost) IPs.

  2. You can try a telnet test to port 3000 between the nodes, trying different combinations (including from a jump server outside the cluster).

If telnet works in most cases but fails in some, it is most likely a firewall issue.

Timestamp

September 2019