Explanation of replication-factor config parameter


We are starting to roll out our Aerospike servers into production. For budgetary reasons we are currently restricted to 2 nodes in our cluster. I have read up on the documentation with regards to replication as found here: http://www.aerospike.com/docs/operations/configure/namespace/durability/

However I am finding it a bit lacking in terms of explaining what “replication-factor” actually does apart from, well, setting replication “factor”. So here I have a couple of questions with regards to this configuration setting:

  1. What happens when you set this configuration to 1? Does it disable replication?

  2. In the current of 2, if we want to increase data storage capacity of our cluster, how many nodes should we add into the cluster?



You can add just one node to increase the capacity. There is no minimum number of nodes required for parity. Though generally speaking, number of nodes should be >= replication factor. If number of nodes < replication factor, then replication factor becomes 1, i.e, no replication (I think).

While increasing the number of nodes, you should keep in mind that the new node should be of same storage capacity as existing nodes. This is because various capacity factors work on basis of percentage (example - high water marks). Generally nodes in a cluster should be homogenous in terms of hardware capacity.

For RAM size planning and more details, you can refer to http://www.aerospike.com/docs/operations/plan/capacity/


So if replication-factor is 2 and we have 2 nodes in our cluster, basically both have mirror data? And if we add 1 node only to increase capacity in the cluster, the data in this new node is not mirrored in any of the previous 2 nodes until we add another node into the cluster?

Lastly, what if we decide later to change from replication-factor 2 to replication-factor 1 across all nodes in the cluster? What happens then to the data? Is there a documented procedure to do this?



Wrong. Any change in cluster state (addition or removal of node - whether on purpose or due to unwanted reasons - node dying / network issues etc), triggers data rebalancing or migration.


So once you add or remove a node, data migration will move data around across all nodes, creating new copies and rebalancing existing data.

So, when you add a new node, parts of existing data will now be replicated between nodes 1-2, other partitions between nodes 2-3 and some others between nodes 1-3.



Thanks. Everything is much clearer to me now.