Explanation of replication-factor config parameter

rudeb0t · September 24, 2015, 3:03am

We are starting to roll out our Aerospike servers into production. For budgetary reasons we are currently restricted to 2 nodes in our cluster. I have read up on the documentation with regards to replication as found here: http://www.aerospike.com/docs/operations/configure/namespace/durability/

However I am finding it a bit lacking in terms of explaining what “replication-factor” actually does apart from, well, setting replication “factor”. So here I have a couple of questions with regards to this configuration setting:

What happens when you set this configuration to 1? Does it disable replication?
In the current of 2, if we want to increase data storage capacity of our cluster, how many nodes should we add into the cluster?

anshu · September 24, 2015, 4:09am

Correct.

You can add just one node to increase the capacity. There is no minimum number of nodes required for parity. Though generally speaking, number of nodes should be >= replication factor. If number of nodes < replication factor, then replication factor becomes 1, i.e, no replication (I think).

While increasing the number of nodes, you should keep in mind that the new node should be of same storage capacity as existing nodes. This is because various capacity factors work on basis of percentage (example - high water marks). Generally nodes in a cluster should be homogenous in terms of hardware capacity.

For RAM size planning and more details, you can refer to Capacity Planning Guide | Aerospike Documentation

rudeb0t · September 24, 2015, 4:37am

So if replication-factor is 2 and we have 2 nodes in our cluster, basically both have mirror data? And if we add 1 node only to increase capacity in the cluster, the data in this new node is not mirrored in any of the previous 2 nodes until we add another node into the cluster?

Lastly, what if we decide later to change from replication-factor 2 to replication-factor 1 across all nodes in the cluster? What happens then to the data? Is there a documented procedure to do this?

anshu · September 24, 2015, 6:30am

Correct.

Wrong. Any change in cluster state (addition or removal of node - whether on purpose or due to unwanted reasons - node dying / network issues etc), triggers data rebalancing or migration.

So once you add or remove a node, data migration will move data around across all nodes, creating new copies and rebalancing existing data.

So, when you add a new node, parts of existing data will now be replicated between nodes 1-2, other partitions between nodes 2-3 and some others between nodes 1-3.

rudeb0t · September 24, 2015, 9:08am

Thanks. Everything is much clearer to me now.

Topic		Replies	Views
Replication Factor Question How Aerospike Works	1	3302	December 31, 2014
What is the minimum replication factor? Configuration	0	1303	August 16, 2014
How to control data distribution to restricted number of nodes	3	1824	July 10, 2015
Replication factor	1	945	December 27, 2018
Aerospike cluster behavior in different consistency mode? Configuration	6	1598	September 28, 2018

Explanation of replication-factor config parameter

Related topics