Aerospike schema: which one is the better design?


#1

We have database requirements as follows:-

  1. Huge number of records: 10-100 millions per set.
  2. Huge number of bins: around 100 bins per set
  3. Some point queries need to be run within milliseconds.

Now which one of the following two schema design philosophy will be better for Aerospike?

  1. Have one set of each type with all possible bins (in hundreds). But would it degrade Aerospike performance?
  2. Categorize bins and have multiple sets with each set have around 10 bins max. But this means redundant keys in each set. High space complexity and hard to combine data for same key from different bins.

#2

That really depends on a number of factors. How big are the records (combination of all the bins)? When you read a record in Aerospike it will read the whole record on the server and return just the selected bins to the client, but if it’s a very large record and you’re returning only a small set of the data, this may result in a performance penalty as you’re reading too much data from the SSDs. Obviously if your data is memory based this isn’t an issue.

If the bins are small (say integers or short strings) and the total record size small, or you need the whole record in one hit, you won’t get too much performance penalty with the first approach.

Personally, I would set up your cluster to use the first approach and use the Aerospike Benchmark Tool to performance test it and see if it meets your requirements.


#3

@TimF

Thanks you for the answer. It cleared some concepts for me. However we have decided to go with design-2 because of the limitation with key expiry. Aerospike doesn’t support bin level expiry instead whole record will be expired.


#4

Didn’t the multiple key things bring in any disadvantage to your system?