Is Aerospike good for handling less than 500 GB of data?

cogewo5048 · August 5, 2020, 3:20pm

My usecase involves handling data of ~500GB, which will be mutated alot while ensuring strong consistency. Wanted to know if Aerospike is an overkill ?

rbotzer · August 5, 2020, 7:34pm

Aerospike would be a perfect fit, rather than overkill.

First, it’s one of the only distrubted databases that can do strong consistency, and has for years since version 4.0 (March 2018). Verified with Jepsen testing, a long track record at financial institutions (used in actual financial use case), instant payment systems, and so on. It’s the only database that can do strong consistency at high performance, regardless of scale.

One main advantage is being able to start with a small dataset, and grow it many orders of magnitude, from GiBs to PiBs, without modifying your application. No need to add caches, complex retrieve-from-storage logic. Sure, the cluster will need to grow vertically (bigger machines) or horizontally (more machines) or both, but that can be done in a live cluster, without taking down your application. “Write once, scale to any size”. That’s not an official slogan here, but it’s the reality of using Aerospike (of course with the caveat that you should always do proper capacity planning).

By the way, I wrote an article about a different use case that contains the explanation of why performance doesn’t degrade as the dataset scales up (again, as long as the cluster has adequate capacity).

cogewo5048 · August 5, 2020, 8:02pm

Does strong consistency also ensure high availability? If not, then how is high availability impacted?

kporter · August 5, 2020, 10:08pm

This is basically asking if Strong Consistency is Consistent, Available, and Partition-tolerant. No database can claim all three at the same time. Once you have a distributed system, you have already chosen that the system needs to be Partition-tolerant so you really have a choice between consistency or availability (see the “CAP theorem”). Aerospike allows you to choose either ‘available’ or ‘consistent’ mode. If strong-consistency is not enabled then Aerospike will operate in a AP (Available & partition-tolerant) mode. If strong-consistency is enabled then Aerospike will operate in a CP (Consistent & partition-tolerant) mode.

In Strong-Consistency, your availability is going to depend on a number of factors. You can lose up to replication-factor -1 notes and still be fully Available in SC. If you lose more than that Availability will degrade incrementally per additional node lost. Additionally, Availability can be enhanced further if you use rack-aware which means that you have full availability unless you lose a node from more than replication-factor -1 racks (instead of any node) or more than half the cluster size.

rbotzer · August 5, 2020, 11:20pm

There are two blog posts from when Strong Consistency was announced in 4.0, and later Relaxed Consistency mode in 4.5.2, that would be helpful for your question about performance and availability:

Topic		Replies	Views
CP Configuration Configuration	4	1971	January 5, 2018
Aerospike 4.0 now GA Company News	0	1345	March 15, 2018
Aerospike for very high rate of updates	1	756	August 5, 2020
Effect of Availability if going with Strong Consistency Configuration	6	1532	August 5, 2020
Lightweight Electronic Medical Records ( EMR) on national scale - Concerns on Durability vs Availability How Aerospike Works	2	1282	May 15, 2020

Is Aerospike good for handling less than 500 GB of data?

Related topics