How to properly plan huge storage

modeling
data

#1

Hello. For one of our new projects we need to place big amount of data (about 800 000 000 keys) to fast storage. We would like to use Aerospike Community Edition as a storage. The data is a string key-value pairs. So for every key only one bin will be created.

The data will be overwritten very rare, so we can assume that 100% of requests will be for reading.

How should we organize such storage correctly for getting of maximal performance? Should we create one set for all data? Or should we divide into N different sets (we can do it because every key will have prefix wherein data can easily be divided into 80 sets)?

Thanks!


#2

Hi. Your use case would be best served as a single-bin data-in-memory cache, or single-bin with data on SSD. Please read the knowledge-base article on the topic when you make that decision.

Depending on the namespace configuration you decide on, you’ll want to do capacity planning to figure out the size of the nodes and how many of them you need in your cluster. There’s also an article specific to Amazon EC2 capacity planning. If you run into problems you can ask the community for help in the Planning section of the discussion forum.

As long as the data is related, I would keep it all in a single set.