How does Aerospike hold up in partition tolerance benchmarks (in AP state)? Has it been tested using an open-source tool like Jepsen?
We have indeed ran the Jepsen test and Aerospike will fail.
Aerospike is a full-availability system which is designed to minimize likelihood of partitioning, and provide the maximum consistency in an AP system. In the rare case where partitioning happens, Aerospike’s default duplicate resolution scheme will look at all copies of the record and pick the most recent copy as a whole.
For an AP system, Jepsen’s test requires that different partial update of a record (such as different updates to list element) be successfully merged. Aerospike will fail in this case, as doing a complete merge require keeping track of ordering of all partial updates, too impractical to keep.
The more viable alternative is to simply give up availability during the time that partition happens so only one copy can be in the system. This is limiting in most operationally successful deployments, but we do understand that there are use cases where unavailability is preferred over full-availability. We have plans to put such features into Aerospike’s system.
For details, please see http://www.aerospike.com/docs/architecture/acid.html
Thanks for the quick response. I am curious to know more about the test, for instance - the failure rate since the docs here http://www.aerospike.com/docs/architecture/assets/AerospikeACIDSupport.pdf (page 8) mention an auto-merge strategy once the partitions are healed.
Do you have a roadmap for when an application merge policy would be available?
Thanks!
We currently don’t have a set time for this. Would love to discuss one-on-one what your use case is, which will help us prioritize work.
Thanks!