Use of Scan to fetch Critical data

To start off: not an expert on this cluster-consistency topic. One of the AS folks and/or docs might be able to confirm or correct what I say.

But to give you an idea…

  1. A single record always has 1 node as master… You can configure to “read from master” or “AS_POLICY_CONSISTENCY_LEVEL_ALL”, which includes the old master during migration if I remember it correctly and “commit to all replicas” for writes. See here: http://www.aerospike.com/docs/client/c/usage/consistency.html about your options. If you use the AS-generation numbers within your list, you only need that higher consistency by default on the super record. For reading updated records, you can first query from master or any replica and just in case that version is behind re-request with AS_POLICY_CONSISTENCY_LEVEL_ALL. I think, this will give you a pretty solid concistency level in 99.999% of the cases… It won’t get any better with any other clustered solution out there.

However, higher consistency might interfere with availability in case of migrations or unreachable nodes… This is where I’m leaving familiar terrain! This is why I would like to leave 2) totally to the docs or AS staff. It might be faster to create a stack overflow question or new thread if you want a fast response about this.

But as you have noticed, the scan functionality suffers from inconsistency during migrations. With this solution you can reach better consistency but I’m afraid you might have to trade in some availability to achieve this. But with committing to multiple nodes (in-memory), this might be a unneccessary worry (as they all know the most recent version). Split brain causes me headaches… but I see no way one could find a solution for this other than not accepting writes in such a situation (that’s on their roadmap, see here: Stop Writes). Would that be ok? Anyways, from here one you’ll most likely find more truth from the docs or source code of AS than I can provide on that topic. But TBH: we don’t care about split-brain over here. If that happens, we’ll ignore it and manually intervene, if necessarry. I haven’t heard of anybody that experienced this, because AS-clusters tend to be much smaller than Cassandra or others require, which makes this case kinda unlikely if deployed with fallback network connectivity. Sorry, but I can’t solve CAP…

Cheers, Manuel

1 Like