Aerospike over in-memory Cassandra?


#1

I’m a bit of a noob with noSQL & cannot decide for Aerospike over complete in-memory Cassandra.

Use Case: To be used for multiple services in our University ( From social platform to internal financial analytics to network logging to real-time messaging). Our daily active users are also constant(~5000). So my primary requirement is not to get 1M+ TPS but to reduce latency and maintain consistency serving the user data as fast as possible. The DB would be running on 3 bare metal servers with 32-vcore 128GB-Ram 256GB-SSD each connected in 10Gbit. The data won’t be exceeding Ram as most of the data will be archived(to another ElacticSearch Server) every 6 Months.

Also, I don’t mind to take the challenge and do a bit over-engineering & it’s fine if the Cluster is hard to set-up but it should require little or no maintenance for years.

So looking over in-memory DB’s Aerospike seemed a great choice. Then I was very exited to go blazingly fast but then I looked at Aerospike total garbage? & We use Aerospike heavily. It works just fine. Now, this got me thinking it this the best fit for me?

Or should I go for complete in-memory Cassandra which is not optimised for complete in-memory table & still is less performant than Aerospike but has a better data model fit for me, does not have consistency issues and is tried & tested.( I am intrigued by ScyllaDB but it doesn’t have in-memory tables)

I would like to have answers from people with production experience with Aerospike & Cassandra. Also please tell me if I am completely wrong.


#2

Cassandra was not designed to be a key-value system, and Aerospike was a distributed key-value database from the ground up. It makes sense to choose the right tool for the job.

If you’re storing all your data fully in memory, take a look at What’s New in Aerospike 3.12? and What’s New in Aerospike 3.11?, as they include optimizations for such a use case. Specifically see sprigs and CPU pinning.


#3

I can’t imagine how such use cases can be achievied with a key-value store. Cassandra is a good choice. But don’t mix analytics with other live data because jobs (e.g. Spark) can be very resource intensive.

EDIT: I forgot to mention that Apache Cassandra doesn’t have in-memory tables (it’s a DataStax feature). But Linux page cache will cache your data for reads if it fits in memory.