Amount of data that is not available when M nodes go down


#1

Synopsis:

Amount of data that is not available when M nodes go down

Use the following formulas to estimate the amount of data loss when more than the replication factor number of nodes leave the cluster:

Given –
N = number of nodes in cluster
M = number of nodes lost
r = replication factor

Formulas

What portion of data is not available?

m! (n-r)!
-—- -——
(m-r)! n!

For Replication Factor of 2, this simplifies to:

m (m-1)
-———
n (n-1)


if r > m no data loss is expected

Explanation

The formula is actually the permutations of having all replicas on the m nodes that go down, divided by all the permutations for replicas across all n nodes in the cluster.

Example

If two nodes leave a 10 node cluster

2(1)/10(9) = 0.0222 = 2.2%

2.2% of data is lost.