Is Aerospike good for handling less than 500 GB of data?

My usecase involves handling data of ~500GB, which will be mutated alot while ensuring strong consistency. Wanted to know if Aerospike is an overkill ?

1 Like

Aerospike would be a perfect fit, rather than overkill.

First, itā€™s one of the only distrubted databases that can do strong consistency, and has for years since version 4.0 (March 2018). Verified with Jepsen testing, a long track record at financial institutions (used in actual financial use case), instant payment systems, and so on. Itā€™s the only database that can do strong consistency at high performance, regardless of scale.

One main advantage is being able to start with a small dataset, and grow it many orders of magnitude, from GiBs to PiBs, without modifying your application. No need to add caches, complex retrieve-from-storage logic. Sure, the cluster will need to grow vertically (bigger machines) or horizontally (more machines) or both, but that can be done in a live cluster, without taking down your application. ā€œWrite once, scale to any sizeā€. Thatā€™s not an official slogan here, but itā€™s the reality of using Aerospike (of course with the caveat that you should always do proper capacity planning).

By the way, I wrote an article about a different use case that contains the explanation of why performance doesnā€™t degrade as the dataset scales up (again, as long as the cluster has adequate capacity).

1 Like

Does strong consistency also ensure high availability? If not, then how is high availability impacted?

This is basically asking if Strong Consistency is Consistent, Available, and Partition-tolerant. No database can claim all three at the same time. Once you have a distributed system, you have already chosen that the system needs to be Partition-tolerant so you really have a choice between consistency or availability (see the ā€œCAP theoremā€). Aerospike allows you to choose either ā€˜availableā€™ or ā€˜consistentā€™ mode. If strong-consistency is not enabled then Aerospike will operate in a AP (Available & partition-tolerant) mode. If strong-consistency is enabled then Aerospike will operate in a CP (Consistent & partition-tolerant) mode.

In Strong-Consistency, your availability is going to depend on a number of factors. You can lose up to replication-factor -1 notes and still be fully Available in SC. If you lose more than that Availability will degrade incrementally per additional node lost. Additionally, Availability can be enhanced further if you use rack-aware which means that you have full availability unless you lose a node from more than replication-factor -1 racks (instead of any node) or more than half the cluster size.

There are two blog posts from when Strong Consistency was announced in 4.0, and later Relaxed Consistency mode in 4.5.2, that would be helpful for your question about performance and availability: