I hope you guys are reading https://aphyr.com/posts/324-call-me-maybe-aerospike and will post a technical response (not a marketing one) about the deeply serious issues Jepsen has discovered, how you plan to address them, and when. I for one could not trust any data store with such serious issues, and I suspect a lot of people won’t either, and await an answer.
First, we apologize for overstepping the bounds with our marketing pitch and not presenting our consistency claims within the proper technical context. Aerospike chose to be an AP system (out of CAP). Aerospike’s goal is to make the best effort for providing consistency at extremely high performance; however, our documentation did a poor job of surfacing the tradeoffs we made, thus failing to give crisp visibility to the user community on the choices they need to make. We will work on doing a better job on this in the future.
Regarding Jepsen, Aerospike engineers were actively involved in supporting Kyle (@aphyr on Twitter) perform the tests that he reported on this week. Furthermore, Kyle presented his results in person to our engineering team this week. This was followed by a deep discussion on these issues between Kyle and Aerospike engineers. We came up with a preliminary plan of action that we will refine further and share more widely at the appropriate time.
Basically, Aerospike has implemented heuristics that enable the system to perform at very high throughput and low latency while maintaining consistency on many common sort of failures (e.g., node failures and certain types of networking failures short of split-brain). Aerospike, however, failed the Jepsen tests 2 because it chooses availability over consistency in the presence of “split-brain” partitions. Needless to say, this should have been documented better.
So far, Aerospike has focused on 24x7 applications that care more about availability, extremely high throughput, and predictable low latency rather than perfect consistency (there is a tradeoff between performance and consistency). These applications are prepared to trade-off arguably rare consistency glitches for extremely high performance and continually uninterrupted operation of their service.
At Aerospike, we will continue to keep working on providing a better consistency guarantee with our AP system. And we will also continue to improve the product so it supports stronger consistency for applications in the future. We will share a roadmap as - and when - it becomes available. Thanks.
Specific answer to your question. I do not believe that the higher timeouts should have any effect on the conflict resolution issues. The system is still AP, so writes will be lost in case of split-brain scenarios. The fact that it was not seen at certain timeout settings may be an artifact of the test itself.
Data loss was seen at all timeout settings; the difference between “ok” and “info” in the latency graphs is what the Aerospike client returned, not whether that write was later lost. In many of these tests we can’t pinpoint exactly which operations were “lost”–we only know that some fraction of operations didn’t take effect.
@aphyr - Thanks for clarifying the timeouts issue.
Also, was the “repeatable read” setting used in these tests and, if so, did it have any effect on the actual loss percentage? Just curious. I don’t expect that it would make a huge difference in split-brain scenarios, but it definitely is required to provide consistent reads in cases where old or new nodes are rejoining a single unpartitioned cluster.