Data inconsistencies in reviving a dead node

Let’s say there is a Community edition AS cluster with disk based persistence turned on with replication factor 2.

And one for the node dies, and is brought back again. What are the risk associated wrt data consistency after the node is back up. Does the revived node immediately start serving request? or does it wait for replication to complete from other bucket owners?

Since server version 3.13 post “jump”, the incoming server waits for migrations to complete to it before taking back over as master. However, in AP mode, if there is a split brain where to occur where the replicas find themselves on opposite sides of the spit, they will each become master. When they come back together the node that is expected to be the final master (the original master) will continue acting as master while the other node will cease to act as master.

In both situations, these partition copies that have potentially divergent histories are considered to be “duplicates” and by default, again in AP, we resolve duplicates for the “best” version when the disable-write-dup-res configuration is set to false (default). So even if the master doesn’t have all the latest data, it will resolve what it doesn’t have against the nodes containing unique duplicate copies of the partition. There is also a read-consistency-level policy on the clients that is set to one by default but can be set to “all” if read consistency is moderately important, this can also be overridden by the server using the read-consistency-level-override configuration. I say moderately important as you can still read while the cluster doesn’t have all the data or may contain stale data. If you need a stricter consistency policy then strong-consistency (enterprise only) is the only way to go.

This is still hopefully mostly relevant: General questions on rolling restart .

@meher i indeed went through the doc, it did not give me conclusive answer around reading stale data. if a node was down for 1 hour is it still considered a rolling restart.

i had read this doc as well

  • Read transactions - For read transactions in an AP namespace, the client’s policy drives whether to duplicate resolve or not and the server can enforce the behavior through the read-consistency-level-override 7 configuration parameter. Since read duplicate resolution is off by default, there could be stale reads until migrations have completed. As for the previous point, though, waiting for migrations (delta or lead migrations only if migrate-fill-delay is in use) to complete before taking the subsequent node down would not require any duplicate resolution to take place.

Which suggested we can be served stale records. This has gotten me confused.

You definitely can have stale reads in AP, if you absolutely must avoid stale records then you would need to use linearized-reads with the servers configured with strong-consistency = true. There isn’t a way to guarantee no stale reads this in AP, which by definition gives up consistency. The write-commit-level, read-consistency-level, and disable-write-dup-res allows you to tighten the consistency in AP but cannot totally eliminate the possibility of write conflicts and stale reads.

Thanks @kporter . But for a rolling restart (@helper_bro – wouldn’t matter if nodes are down for seconds, minutes or hours, what is important is, for the simple case, one node at a time), if one has enabled duplicate resolution on reads (read-consistency-level-override or from the client policy), there shouldn’t be any stale read, right? Of course, any unexpected node leaving/joining the cluster, during rolling restart or not, can certainly cause stale reads and would require strong consistency to be enable in order to prevent those.