Amount of data that is not available when M nodes go down
Use the following formulas to estimate the amount of data loss when more than the replication factor number of nodes leave the cluster:
Given – N = number of nodes in cluster M = number of nodes lost r = replication factor
What portion of data is not available? m! (n-r)! -—- -—— (m-r)! n! For Replication Factor of 2, this simplifies to: m (m-1) -——— n (n-1) if r > m no data loss is expected
The formula is actually the permutations of having all replicas on the m nodes that go down, divided by all the permutations for replicas across all n nodes in the cluster.
If two nodes leave a 10 node cluster 2(1)/10(9) = 0.0222 = 2.2% 2.2% of data is lost.