Aerospike for frequent updates and 22TB+ storage

Hello. I’m currently trying to asses if Aerospike is a good fit for our project. we have 22TB+ (+ x3 replication factor probably) of data at the start, around 540B records, each record contains 3 counters. main activity is either to insert a record or update counters (mostly update counters probably). writes request rate 75k per sec. also we want to be able to select records by 3 fields. I wonder if Aerospike would fit and how can I estimate the cost approx. thanks in advance!

Hi @rustamg. Yes, that sounds like a very good fit for Aerospike. There are a lot of use cases at this scale and larger which store counters for things like frequency capping in ad-tech for example which use Aerospike. It’s designed for very high transaction rates and data volumes as well as resiliency.

You mentioned selecting records by 3 fields. Aerospike does support secondary indexes to look up records by fields, for example selecting records falling into a particular date range or finding a field with a certain string value.

One note on this though: I see this a lot in ad-tech where the fields they want to select by are external-ids being mapped to an internal id. A secondary index could be used for this, but it’s typically better to use a reverse-mapping table as it’s significantly faster and lighter weight.

As an aside, O’Reilly just released the Aerospike Up and Running book, and I wrote several chapters including the one on data modeling which covers these scenarios. If you would like to chat with me directly about what you’re trying to achieve, one of the easiest ways is join our discord forum at Aerospike Developers and ping me directly on there. Otherwise you can go to our download page for the Free (Community) Edition and there is a link at the bottom to book some time with one of our engineers (which may well be me… :slight_smile: )

In terms of cost – it depends on where you’re trying to run the software. Is this bare metal, or cloud, and if the cloud which one?

Hello Tim, thank you for your response. we’re planning to use bare metal.

Hi @rustamg. Great, thank you. Do you have the hardware, or would you be acquiring it for this project? Aerospike can scale nicely both horizontally and vertically and it’s use of hardware is very efficient, typically reducing the hardware requirements at scale by over 80% from other NoSQLs like Cassandra. This can lead to substantial cost savings.

There are a number of other considerations to look at too. For example:

  • How many copies of the data do you need to store? You mentioned 3 above, which in Aerospike would allow you to lose 2 copies of the data simultaneously. Most other systems it would only allow you to lose 1 copy of the data. If you only need to support losing 1 copy of the data, you might be able to get away with 2 copies of the data in Aerospike, offering further savings.

  • If you’re familiar with the CAP Theorem, do you prefer to run in AP mode or CP mode? Aerospike supports both, and the same cluster can even have some data being in AP mode and other data in CP mode.

  • Do you need replication of the data to other data centers? Aerospike supports both synchronous replication and asynchronous replication even across global distances, though synchronous replication typically incurs additional write latency due to the speed of light being finite.

From the numbers you gave above, your objects appear to be quite small (~44 bytes). This isn’t a problem, but does lead to some design discussions to make sure your memory requirements aren’t too large. Again, this is something we have helped many companies with.

I do find however that people coming from other technologies in particular columnar systems like Cassandra model the data in ways they are familiar with whereas there may be far more efficient ways of modeling the data in Aerospike. Audience segmentation use cases are a prime example of this. Again, I’d be happy to discuss further if you would like.