Using iptables to simulate network issues in an Aerospike cluster

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

Using iptables to simulate network issues in an Aerospike cluster

Context

During your testing of how Aerospike reacts during fault conditions, you may wish to simulate network issues, either between your clients and your Aerospsike cluster, or the intra-cluster communication between one or many of the nodes in your Aerospike cluster.

Before setting up this test scenario, it is wise to take a moment to consider what a machine (whether that be client or server) will see during a network failure, and trying to replicate that the best we can by using a local firewall (in this case iptables) on the machines in question

Ports

The best way to simulate a network failure is to block incoming/outgoing traffic based on the ports Aerospike uses. This allows us to limit only Aerospike traffic, and keep other services like SSH available. By default, Aerospike makes use of the following ports:

Name Port (TCP) TLS Port (TCP) Description
Service 3000 4333 Used on Aerospike cluster nodes for incomming traffic from client machines. It is also used for incoming XDR connections from other clusters
Fabric 3001 3011 Used for the Aerospike cluster nodes to share data between themselves. This carries things such as replication traffic and migrations in case of cluster changes
Heartbeat (mesh) 3002 3012 Used between Aerospike cluster nodes to check that other nodes are still there. By default this is every 150ms

Note: For the rest of this guide we are going to assume our setup is not using TLS. If you are using TLS for any/all of your communications, you will need to subtitute the above port numbers for the ones we use here

Based on the above, we can see that in order to simulate client-server network issues, we need to block port 3000, and to simulate intra-cluster network issues we need to block ports 3001/3002 between server nodes

DROP or REJECT?

iptables allow us the option to DROP or REJECT a packet. Both of these options will prevent the packet from reaching the destination specified in the rule, but they differ in what is sent back to the source. In the case of a DROP rule, the packet is silently discarded and no reply is sent back to the source. With a REJECT rule, the packet is still discarded, but an error is also returned to the source, in the form on an ICMP error packet. These packets are designed to allow networked devices to understand what is happening on the network, and receiving an ICMP error back is what we would expect to see. We can ask iptables to return various different ICMP error packets such as icmp-net-unreachable, icmp-host-unreachable, icmp-port-unreachable or icmp-proto-unreachable. For our testing it does not matter which ICMP error packet is returned (as long as we receive one), so we can leave it as the default, which is icmp-port-unreachable.

Dealing with Established connections

It is not uncommon for iptables to already be configured to allow previously established connections through, especially on the INPUT chain. This means that iptables rules only need to dictate whether a new connection is allowed to be established or not, and then once it is estabished, it is trusted and allowed through. Because the vast majority of packets will relate to a previously established connection, to be as efficient as possible, the rule to allow through these packets is often the very first rule.

When implementing our new rules, we need to consider that Aerospike will re-use existing connections wherever possible, so it is likely that we will have existing connections already open and being reused. This means we need to make sure that our blocks are before any rules that allow established connections through. Because this is less likely to be a configured on the OUTPUT chain of the source machine, where possible it is easier to do the block here rather than on the INPUT chain of the destination machine

Examples

Blocking Client to Cluster communication

On the client machine(s), add the following rule

iptables -A OUTPUT -p tcp --dport 3000 -J REJECT

Blocking Client to Single Cluster Node communication

On the client machine(s), add the following rule

iptables -A OUTPUT -p tcp -d 10.10.10.10 --dport 3000 -J REJECT

Blocking Intra-cluster communication to a Single Cluster Node

On the cluster node you wish to block communication to, add the following rule

iptables -A INPUT -p tcp --dport 3001:3002 -J REJECT

Note: If this machine already has a rule to allow ESTABLISHED connections on its INPUT chain, the above rule will need to be added before the rule to allow ESTABLISHED connections. Explaining iptables configuration is beyond the scope of this guide, but in this particular instance you will most likely want something similar to this rule instead

iptables -I INPUT 1 -p tcp --dport 3001:3002 -J REJECT

Removing rules afterwards

Once you have finished your testing, you can remove the rules you added by running the same command, but with the -D option instead of the -A option. For example, to remove our Cluster to Cluster communication block you would run the following

iptables -D OUTPUT -p tcp --dport 3000 -J REJECT

Notes

  • WARNING It is strongly recommended that the above is only run in a test environment. Also, adding iptables rules without fully understanding them could result in either exposing more of your server than intended, or blocking more than you intended (including the SSH session you are using to connect to the server). If in any doubt, please check with your security/network teams first

Keywords

IPTABLES CLIENT SERVER BLOCK

Timestamp

March 2021