My usecase involves handling data of ~500GB, which will be mutated alot while ensuring strong consistency. Wanted to know if Aerospike is an overkill ?
Aerospike would be a perfect fit, rather than overkill.
First, it’s one of the only distrubted databases that can do strong consistency, and has for years since version 4.0 (March 2018). Verified with Jepsen testing, a long track record at financial institutions (used in actual financial use case), instant payment systems, and so on. It’s the only database that can do strong consistency at high performance, regardless of scale.
One main advantage is being able to start with a small dataset, and grow it many orders of magnitude, from GiBs to PiBs, without modifying your application. No need to add caches, complex retrieve-from-storage logic. Sure, the cluster will need to grow vertically (bigger machines) or horizontally (more machines) or both, but that can be done in a live cluster, without taking down your application. “Write once, scale to any size”. That’s not an official slogan here, but it’s the reality of using Aerospike (of course with the caveat that you should always do proper capacity planning).
By the way, I wrote an article about a different use case that contains the explanation of why performance doesn’t degrade as the dataset scales up (again, as long as the cluster has adequate capacity).
Does strong consistency also ensure high availability? If not, then how is high availability impacted?
This is basically asking if Strong Consistency is Consistent, Available, and Partition-tolerant. No database can claim all three at the same time. Once you have a distributed system, you have already chosen that the system needs to be Partition-tolerant so you really have a choice between consistency or availability (see the “CAP theorem”). Aerospike allows you to choose either ‘available’ or ‘consistent’ mode. If
strong-consistency is not enabled then Aerospike will operate in a AP (Available & partition-tolerant) mode. If
strong-consistency is enabled then Aerospike will operate in a CP (Consistent & partition-tolerant) mode.
In Strong-Consistency, your availability is going to depend on a number of factors. You can lose up to
replication-factor -1 notes and still be fully Available in SC. If you lose more than that Availability will degrade incrementally per additional node lost. Additionally, Availability can be enhanced further if you use
rack-aware which means that you have full availability unless you lose a node from more than
replication-factor -1 racks (instead of any node) or more than half the cluster size.
There are two blog posts from when Strong Consistency was announced in 4.0, and later Relaxed Consistency mode in 4.5.2, that would be helpful for your question about performance and availability: