FAQ - Increasing maximum cluster size in an Aerospike cluster


#1

Detail

This article covers the FAQ’s around increasing cluster size in an Aerospike cluster that is running versions prior to 3.14 or running with the heartbeat protocol set to v2.

NOTE: For clusters running server version 3.14 and above (or 3.10 with hb v3), increasing the number of nodes in the cluster does not need any specific configuration change. Maximum number of nodes in the Aerospike Enterprise Edition Server cluster is 128.

NOTE: For Aerospike Community Edition Server, the maximum number of nodes in a cluster is limited to 31 for versions prior to 4.0 and limited to 8 for versions 4.0 and above.

What is the default maximum number of nodes possible in an Aerospike cluster?

The default maximum number of nodes that you can have in a cluster is 31. It depends on the paxos-max-cluster-size configuration which if not specified has a value of 32.

Configuration reference

Note:paxos-max-cluster-size configuration is deprecated for version 3.10 and above (HB v3 only). With Heartbeat protocol version v3 (available 3.10 onwards), you can increase the cluster size dynamically for both mesh and multicast setup.

What if I add more than the default configured maximum number of nodes?

If a cluster has 31 nodes, and you try to add a 32nd node with a paxos-max-cluster-size configuration of 32, you see the following error in aerospike.log:

Aug 20 2015 10:13:38 GMT: CRITICAL (paxos): (paxos.c:as_paxos_transaction_apply:1556) succession list full

You see the error because the number of nodes in the cluster cannot exceed the limit set in paxos-max-cluster-size.

Why is the default paxos-max-cluster-size not set to 128?

The paxos-max-cluster-size is not set to 128 by default because Paxos and Heartbeat use extra network bandwidth for each extra node cited in this configuration.

How would I increase the number of nodes more than 31?

The steps are different depending on if you have a mesh or a multicast configuration.

IMPORTANT NOTE: You would need to ensure that no clients connect to the cluster during the maintenance window to avoid any unexpected behavior.

Mesh

  1. Bring the cluster down.

  2. Update configuration to reflect paxos-max-cluster-size to 128 under service stanza. http://www.aerospike.com/docs/reference/configuration/#paxos-max-cluster-size

  3. Bring back the nodes one at a time waiting for each one to form a cluster with the previous one before bringing up the next node.

  4. You can re-confirm the paxos and heartbeat protocol before and after restart.

asadm> show config like protocol

Multicast

For a multicast configured cluster, you can increase the number of nodes in the cluster following these steps:

  1. Ensure that no clients connect to the cluster during the maintenance window to avoid any unexpected behavior.

  2. Verify the current protocols for Heartbeat and Paxos. Run the following command:

    asadm > show config like protocol
    

    Note the values for the following entries:

    • heartbeat-protocol
    • paxos-protocol
  3. Change the heartbeat-protocol and paxos-protocol to none. While you can use asadm -e to change it on all the nodes, you are free to make the change however you want, provided that all nodes are all changed within about one second of each other. You can put this in a script, you can make the change with puppet, or you can find another way to make the change on all nodes in the cluster.

    Change heartbeat first, and then change Paxos-Protocol. The following examples show this change using asmonitor:

    asadm -e 'asinfo -v "config-set:context=network;heartbeat.protocol=none"'
    asadm -e 'asinfo -v "config-set:context=service;paxos-protocol=none"'
    

    For versions prior to 3.9:

    asadm -e 'asinfo -v "config-set:context=network.heartbeat;protocol=none"'
    asadm -e 'asinfo -v "config-set:context=service;paxos-protocol=none"'
    
  4. Run the following command to dynamically change the paxos-max-cluster-size to 60:

    asadm -e 'asinfo -v "set-config:context=service;paxos-max-cluster-size=60"'
    
  5. Modify the aerospike.conf files, so that the cluster size change persists even if the nodes are restarted. Add paxos-max-cluster-size=60 to all aerospike.conf files in the cluster. The following example illustrates the change:

    service {
      user root
      group root
      paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
      pidfile /var/run/aerospike/asd.pid
      service-threads 4
      transaction-queues 4
      transaction-threads-per-queue 4
      proto-fd-max 15000
      paxos-max-cluster-size 60
      }
    
  6. Change Paxos back to the initial value from step 1. It is probably v3, but use whatever was in the initial get-config output. Again, you can make the change however you would like, provided that the change is made on all nodes in the cluster within one second. The following example illustrates the asadm syntax:

    asadm -e 'asinfo -v "config-set:context=service;paxos-protocol=v3"'
    
  7. Change heartbeat back to the initial value from step 1. It is probably v2, but use whatever was in the get-config output. The following example illustrates the asadm syntax for making the change:

    asadm -e 'asinfo -v "config-set:context=network;heartbeat.protocol=v2"'
    

    For versions prior to 3.9:

    asadm -e 'asinfo -v "config-set:context=network.heartbeat;protocol=v2"'
    
  8. Verify that heartbeat and paxos protocol are now set to the versions from Step 1. Run the following command to verify the current values for these configurations:

    asadm > show config like protocol
    

Any other factors to keep in mind?

  • Ensure that no clients connect to the cluster during the maintenance window to avoid any unexpected behavior.
  • Re-confirm if the cluster is on mesh or multicast configuration. If the cluster uses mesh for heartbeats, you cannot make dynamic changes to paxos-max-cluster-size so would need to have a cluster downtime in order to increase the cluster size more than the current configured.
  • Confirm that the cluster is not using ‘rack aware’ mode (versions prior to 3.9).
  • You cannot change node IP addresses.
  • The commands must be run on all nodes in the cluster within about one second of each other. The method of making the change is less important than making the change as close to simultaneously as possible.
  • If you need to make these changes for nodes that are on 2.x (2.1 and above), use clinfo in place of asinfo.
  • The ability to increase the cluster size dynamically is available for Aerospike server version 3.10 and above after switching to heartbeat version v3. Please see here for details: How do I change my heartbeat protocol from v2 to v3?

Keywords

cluster size change mesh multicast

Timestamp

07/27/2017


Is there any limit on the cluster size when using mesh heartbeats?
FAQ - What are the unanimous configuration parameters