FAQ - What are the potential drawbacks of using paxos-single-replica-limit

FAQ - What are the potential drawbacks of using paxos-single-replica-limit

Detail

The configuration parameter paxos-single-replica-limit defines a number of nodes under which a cluster reverts to keeping a single copy of all partitions regardless of the configured value for replication-factor. On the face of it, the parameter has solid logic, defining a point at which cluster availability is prioritised over data resilience, however there are some serious drawbacks which mean usage is not generally recommended. What are these?

Answer

The first, and most obvious issue with paxos-single-replica-limit is that, at the time of writing this article, it is both static and unanimous meaning that it requires a whole cluster shutdown to set. This could be tolerable if this parameter could be set and left, however, as the cluster expands and scales out it is likely that the value of paxos-single-replica-limit should be increased. This means that to scale the cluster and, in turn, paxos-single-replica-limit full cluster stops would be required. This is likely not acceptable in most use cases and is not good operational practice.

The second and more subtle drawback to paxos-single-replica-limit is the way in which it can interact with multiple node shut downs. In a rack-aware it is quite common to shut down whole racks of nodes at once for maintenance. This is not an issue as the remaining rack will have a copy of all partitions and will migrate to create replicas according to the desired replication-factor. It is a solid operational technique.

To illustrate the potential danger here, consider an 8 node cluster split equally into 2 racks with replication-factor. 2. In sizing the cluster it has been determined that the minimum amount of nodes required to keep 2 copies of data is 6 and so paxos-single-replica-limit is set to 6. To perform maintenance one rack is shut down. During the operational procedure, as often is the case, there is a gap of a few seconds between nodes shutting down. Unfortunately this means that the cluster size drops from 8, to 7 and then to 6, 5 and finally 4 nodes. Due to paxos-single-replica-limit as soon as the cluster size hits 6, any replica copies of data are immediately dropped. So when the final 2 nodes in the rack shut down, they take data with them that does not exist anywhere else in the cluster. Thus, the application is impacted.

This scenario indicates why paxos-single-replica-limit is not recommended for most use cases. Not only does it impede the horizontal scaling for which Aerospike is well renowned, the unpredictable behaviour combined with rack-aware hampers operational agility.

Notes:

  • Dynamically increasing the heartbeat.interval and heartbeat.timeout prior to maintenance when shutting down multiple nodes can be used to delay cluster re-formation and make sure intermediate clusters are not formed (not necessarily a recommended operational procedure, though).
  • The migrate-fill-delay configuration should be used to control capacity and prevent migrations from replicating missing copies of partitions when one or more nodes leave a cluster.

Keywords

RACK AWARE PAXOS SINGLE REPLICA LIMIT DRAWBACK LIMITATION

Timestamp

May 2021

© 2021 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.