FAQ - What are the potential drawbacks of using
The configuration parameter
paxos-single-replica-limit defines a number of nodes under which a cluster reverts to keeping a single copy of all partitions regardless of the configured value for
replication-factor. On the face of it, the parameter has solid logic, defining a point at which cluster availability is prioritised over data resilience, however there are some serious drawbacks which mean usage is not generally recommended. What are these?
The first, and most obvious issue with
paxos-single-replica-limit is that, at the time of writing this article, it is both static and unanimous meaning that it requires a whole cluster shutdown to set. This could be tolerable if this parameter could be set and left, however, as the cluster expands and scales out it is likely that the value of
paxos-single-replica-limit should be increased. This means that to scale the cluster and, in turn,
paxos-single-replica-limit full cluster stops would be required. This is likely not acceptable in most use cases and is not good operational practice.
The second and more subtle drawback to
paxos-single-replica-limit is the way in which it can interact with multiple node shut downs. In a
rack-aware it is quite common to shut down whole racks of nodes at once for maintenance. This is not an issue as the remaining rack will have a copy of all partitions and will migrate to create replicas according to the desired
replication-factor. It is a solid operational technique.
To illustrate the potential danger here, consider an 8 node cluster split equally into 2 racks with
replication-factor. 2. In sizing the cluster it has been determined that the minimum amount of nodes required to keep 2 copies of data is 6 and so
paxos-single-replica-limit is set to 6. To perform maintenance one rack is shut down. During the operational procedure, as often is the case, there is a gap of a few seconds between nodes shutting down. Unfortunately this means that the cluster size drops from 8, to 7 and then to 6, 5 and finally 4 nodes. Due to
paxos-single-replica-limit as soon as the cluster size hits 6, any replica copies of data are immediately dropped. So when the final 2 nodes in the rack shut down, they take data with them that does not exist anywhere else in the cluster. Thus, the application is impacted.
This scenario indicates why
paxos-single-replica-limit is not recommended for most use cases. Not only does it impede the horizontal scaling for which Aerospike is well renowned, the unpredictable behaviour combined with
rack-aware hampers operational agility.
- Dynamically increasing the
heartbeat.timeoutprior to maintenance when shutting down multiple nodes can be used to delay cluster re-formation and make sure intermediate clusters are not formed (not necessarily a recommended operational procedure, though).
migrate-fill-delayconfiguration should be used to control capacity and prevent migrations from replicating missing copies of partitions when one or more nodes leave a cluster.
RACK AWARE PAXOS SINGLE REPLICA LIMIT DRAWBACK LIMITATION