How to divide a mesh cluster in to two parts

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

FAQ How to divide a single mesh cluster in to two separate clusters.

Context

Two clusters using mesh heartbeats may accidentally join into one if a node in one cluster ends up with a node from the other in its list of seed nodes and they do not have a cluster-name set. This can happen due to a misconfiguration in aerospike.conf, an error in dynamic use of the tip info command, or reuse of an IP address that formerly belonged to a node in one cluster for a node in the other. This article describes how to separate the clusters again.

WARNING: These instructions only apply when the two original clusters did not have any namespace names in common. If they do share a namespace, data loss may occur. Aerospike Enterprise customers should contact Aerospike Support in such case.

WARNING: These instructions only apply when the two merged clusters use mesh heartbeats. If they use multicast heartbeats, Aerospike Enterprise customers should contact Aerospike Support

Solution

There are two methods of separating the clusters, at the networking level by blocking internode communication with iptables, or at the application level by setting different cluster-name values on the nodes. We recommend the later, when it fits the use case.

Step 0:

  • Check the config files of all nodes from both the clusters and delete any mesh-seed-address-port entry that points to nodes from the other cluster. (Nodes from the mesh seed list of cluster A should not refer to nodes from cluster B and vice versa). This will prevent the clusters from rejoining the next time nodes are started.

Step 1:

  • Use tip-clear to dynamically remove the unneeded nodes from every node’s mesh seed list. For example:
asinfo -v 'tip-clear:host-port-list=172.3.0.1:3002,172.3.0.2:3002, ...'
  • This command must include the mesh seed IP address and port for every node in the other cluster, not just the ones in the config file (run on each node of cluster A with all addresses for cluster B, and then on each node of cluster B with all addresses for cluster A).

  • Refer to the knowledge-base article on Tip-Clear not working as expected if you are using DNS in your mesh seed list.

  • Verify that the new configuration is correct by executing the following command on each node and looking for the Heartbeat Dump information in the aerospike.log file:

asinfo -v 'dump-hb:verbose=true'
  • Each node should only have the addresses of the other nodes in its cluster; if it has any nodes from the other cluster, modify and rerun the tip-clear command until the clusters no longer have references to each other in “HB Mesh Nodes”.

Step 2a (if using iptables):

  • Using the same list of cluster B’s IP address as in Step 1, run these commands on every node of cluster A, and then run them on every node of cluster B with cluster A’s IP addresses. Note that the list after “SOURCEIP in” is separated by spaces, and does not include the port numbers, unlike the list passed to tip-clear.
for SOURCEIP in 172.3.0.1 172.3.0.2 ... ; do
    for PORT in 3001 3002 ; do
    iptables -I INPUT -p tcp -m tcp --source ${SOURCEIP} --dport ${PORT} -j REJECT --reject-with icmp-host-unreachable
    iptables -I OUTPUT -p tcp -m tcp --destination ${SOURCEIP} --dport ${PORT} -j REJECT --reject-with icmp-host-unreachable
    done
done
  • You can find more information about using [iptables] to separate nodes from a cluster in this KB article.

  • After a few seconds, the clusters should now be separate. We recommend setting cluster-name on all nodes of both clusters to prevent this from happening again.

Step 2b (if using cluster-name):

  • Choose names for the two clusters. In these instructions, we will use “ClusterA” and “ClusterB”, but any two names are fine. On every node of cluster A, execute the command
asinfo -v 'set-config:context=service;cluster-name=ClusterA'
  • Then, on every node of cluster B
asinfo -v 'set-config:context=service;cluster-name=ClusterB'
  • After a few seconds, the clusters should now be separate. Add the cluster-name settings to aerospike.conf to make them persistent and prevent this from happening again.

Note on using cluster-name

  • When setting a cluster-name on an Aerospike server node, the cluster-name will be returned to the client. If the client has a cluster-name set that doesn’t match, the transaction will fail. If the client does have a matching cluster-name or if it doesn’t have any cluster-name set, the transaction will proceed.

Keywords:

CLUSTER DIVIDE MERGE TIP-CLEAR TIP IPTABLES CLUSTER-NAME MESH

Timestamp

September 2019