Split Brain detection and impact


How to detect if a cluster did go through a split brain situation

Split brain is a state where a cluster splits into multiple clusters of smaller sizes.

Detecting a split brain situation

Here are the main symptoms of a cluster that has split:

  1. Check for cluster_size on each node. If it is less than the expected cluster size on some nodes, it would indicate that some nodes have left the cluster or formed a sub-cluster, potentially with other nodes.

Note: A single node departing the cluster is not really considered a split brain. Even if the node is still alive and takes ownership of all the partitions, clients would recognize this situation and will not submit transactions to that ‘orphaned’ node.

  1. Grep for the keywords “departed” and/or “applied cluster size” in the logs, this will indicate that a node has departed the cluster and a new cluster size has been applied.

For example:

Sep 26 2018 06:51:20 GMT: INFO (fabric): (fabric.c:2486) fabric: node bb9f2054e2ac362 departed
Sep 26 2018 06:51:21 GMT: INFO (clustering): (clustering.c:5808) applied cluster size 2

This indicates which node departed and the effective cluster_size after that.

  1. Check for rebalance and migrations. Any cluster change would trigger a rebalance followed by migrations (redistribution of the partitions across the nodes in the cluster).
{ns_name} rebalanced: expected-migrations (1215,1224) expected-signals 1215 fresh-partitions 397
{ns_name} migrations: remaining (654,289,254) active (1,1,0) complete-pct 88.49

Or, for strong-consistency enabled namespaces:

{ns_name} rebalanced: regime 295 expected-migrations (826,826) expected-signals 826 expected-appeals 0 unavailable-partitions 425

Refer to the monitoring migrations doc and knowledge base article on this topic.

After effects of split brain for AP namespaces (non strong consistent)

One of the potential issue with a split brain situation in namespaces without strong-consistency enabled is the creation of fresh partitions in sub-clusters. A fresh partition is a partition missing in a sub-cluster (both the master and replica(s) nodes owning such partitions are not present in the sub-cluster), causing a new ‘fresh’ one to be instantiated. This is an issue since as it would cause inconsistencies when the cluster fully reforms. This can be checked through this log line:

{ns_name} rebalanced: expected-migrations (1215,1224) expected-signals 1215 fresh-partitions 397

The writes (updates turning into inserts for example) on such fresh partitions would be ‘conflict resolved’ based on the configured conflict-resolution-policy when the cluster reforms.

If the conflict-resolution-policy is set to generation, which is the case by default, the records with the higher generation will win. This of course may not be the most recent version of the record, rather simply the version of the record that has been updated the most times. For keeping the most version of the record, the conflict-resolution-policy must be set to last-update-time.

The use case / application will dictate the best value to use. If historical data (record with multiple bins updated over time) is more important than the most recent update, then last-update-time should be used.

XDR consideration

If a cluster being an XDR destination goes through a split brain situation, the same logic applies. The aftermath of an XDR cluster shipping to a cluster that has gone through a split brain will depend on:

It is recommended to consider configuring namespaces with strong-consistency for any use case sensible to such situation. For namespaces running with the strong-consistency mode, a split brain would never create fresh partitions, and will instead have some partitions potentially unavailable, causing XDR to re-log and try at a later time.




26 Sep 2018