Rack-Aware Feature is not currently available for Replication Factor greater than 2


#1

When setting up a 3 node cluster with a replication factor of 3 and rack-awareness enabled, the service will not start with this critical error:

Rack-Aware Feature is not currently available for Replication Factor greater than 2

I’m failing to understand the reasoning for why this limitation is in place? Thank you for your time.


#2

Thanks for posting on our forum!

As you found out, rack-aware is currently only in place for replication factor 2. It is just not implemented for higher replication factors. So, it is only for the more basic and trivial use case of replication factor 2 that the feature is in place, where we have to make sure that for each partition the master and replica copies are on 2 nodes in different racks.

In general though, we do recommend setting up clusters with nodes as close to each other as possible to ensure highest stability when it comes to the vital heartbeat traffic between the nodes. Failure cases would then be covered by setting up a second cluster and our XDR component for replicating data to it.


#3

Thank you for your response. Our use case is an AWS VPC with 3 availability zones, so we’re going without rack aware, and a replication factor of 3. Each node in the cluster is stated with config option mesh-seed-address-port. In this case, I believe we do not need to a cluster{} config, correct?

Also, if we were to add a node in the future, assuming we do not a have a cluster { mode dynamic } config, would we need to restart all nodes?


#4

You don’t need the cluster {} config if you are not using rack-aware.

Adding nodes is straight forward. You do not need to restart any other node. Just add a new node with the correct configuration and it will join the cluster and data will automatically rebalance (what we call migrations).

Thanks, –meher


#5

May I follow up with an additional question.

When creating a cluster on Amazon, would you recommend to treat Availability Zones like if they were different regions? In other words, would you create all the nodes for your cluster in the same Availability Zone as opposed to spreading the nodes across availability zones for the same region?

The problem with this approach is that it is expensive.

Thank you


#6

It depends on your goals, we have users deploying a single cluster across availability zones. By doing so they sacrifice performance by increasing intra-cluster latency. The users willing to sacrifice that performance treat each availability zone as a rack in a rack aware configuration. Using rack aware allows you to lose an entire AZ without losing data, but also guarantees every write will be penalized with a cross AZ latency hit.