Changing netmask on a running cluster


#1

How to change the netmask of a cluster while the nodes stay running

Task

To change the network subnet mask of the nodes in a cluster (but not the IP of the nodes) while the cluster and all nodes stay up and running. Production traffic must remain flowing.

Method

The way to achieve this without restarting any nodes is to shut down the paxos protocol between nodes on a temporary basis. To do this we have a process with 4 discrete stages.

  1. Check current configuration
  2. Shut down paxos protocol
  3. Make change
  4. Reset paxos protocol so that nodes start communicating again

1. Check current configuration

To do this we use the asadm command and run ‘show-config’ once inside. The two parameters we need to make a note of are paxos-protocol and heartbeat-protocol. A truncated output would look something like as follows:

Admin> show config
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Service Configuration~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                               :   10.0.0.100                   10.0.0.101          

paxos-protocol                     :   v3                           v3                           

~~~~~~~~~~~~~~~~Network Configuration~~~~~~~~~~~~~~~~~~~
NODE : 10.0.0.100 10.0.0.101 

heartbeat-protocol : v2 v2   

We make a note of these two values. Notice this is shown for both nodes.

2. Shut down paxos protocol

We shut down the paxos protocol and the heartbeats from the principal node, again we use asadm. When we use ‘asadm -e’ we can issue an ‘asinfo’ command to all nodes in the cluster concurrently.

[vagrant@dc1xdr1 ~]$ asadm -e 'asinfo -v "set-config:context=network.heartbeat;protocol=none"'
10.0.0.101 (10.0.0.101) returned:
ok
10.0.0.100 (10.0.0.100) returned:
ok
[vagrant@dc1xdr1 ~]$ asadm -e 'asinfo -v "set-config:context=service;paxos-protocol=none"'
10.0.0.101 (10.0.0.101) returned:
ok
10.0.0.100 (10.0.0.100) returned:
ok

3. Make network submask changes

We use ifconfig to change the values, variables are shown here, these should be replaced with values for the new netmask and the applicable network interface. We have to add the route to the default gateway as changing the netmask will remove this.

ifconfig $INTERNAL_INTERFACE netmask $NEW_MASK

route add default gw $GATEWAYIP 

4. Reset paxos protocol and heartbeat so the nodes start to communicate again.

Here we simply reverse the process from step 2 using the values we noted down in step 1.

[vagrant@dc1xdr1 ~]$ asadm -e 'asinfo -v "set-config:context=service;paxos-protocol=v3"'
10.0.0.101 (10.0.0.101) returned:
ok
10.0.0.100 (10.0.0.100) returned:
ok
Config files location: /home/vagrant/.aerospike/
[vagrant@dc1xdr1 benchmarks]$ asadm -e 'asinfo -v "set-config:context=network.heartbeat;protocol=v2"'
10.0.0.101 (10.0.0.101) returned:
ok
10.0.0.100 (10.0.0.100) returned:
ok

We then check cluster using asadm ‘get-config’ and ‘info’ commands.