When a node is removed from a cluster configured with the default
prefer-uniform-balance setting (
true), there appears to be additional migrations occurring. Is this expected?
The use of
prefer-uniform-balance has major advantages as it ensures partitions are evenly distributed across nodes in the cluster. Prior to the introduction of the
prefer-uniform-balance algorithm to balance the partition distribution, a variation of around 10% in the number of partitions owned by nodes would often be observed, sometimes even much higher.
The optimized uniform distribution comes at a small cost: there may be additional migrations needed to keep this uniformity among nodes in a cluster. This usually comes to light when a node is permanently removed from a cluster.
In non-uniform-balance, when a node is removed, all migrations will be one-way migrations for partitions that were owned by that node. With
prefer-uniform-balance, some of the adjusted partitions will cause additional one-way or two-way migrations.
Rack-aware with prefer-uniform-balance does add more migrations because the rack-aware constraint restricts which nodes can own replica copies of partitions (as rack aware prevents more than one copy of a partition to be in the same rack). This restriction required us to allow more head-room to achieve near-optimal balance.
- With rack aware and
prefer-uniform-balancethere can be much more migrations that will occur compared to when not using those features.
- In rack aware, there are adjustments being made to the partition distribution when there are about 1024 partitions remaining to assign (or balance – for non rack aware those adjustment happen when there are only 128 partitions remaining to assign).
- The worst case scenario for rack aware with
prefer-uniform-balanceand RF (replication factor) 2 is 2048 partition migrations (RF * 1024) .
- In a non uniform balance scenario, there would be no migrations adjustment and migrations would only be one-way when taking down a node.
A node will not drop a partition until it has sucessfully migrated to another node.
MIGRATIONS PREFER-UNIFORM-BALANCE NODE REMOVAL