How well suited is Aerospike for this niche use case


#1

I’m looking for a data store that would be optimal for storing, incrementing, and retrieving only integers, and I’m wondering if Aerospike would be a fair solution. The logical structure of my data would be akin to a Map<Integer,Map<Integer,Integer>>. The data are groups of Integer counters. Each counter has it’s own Integer id that is unique within a group, and each group has its own unique Integer id which is completely unique. So I need to use Integer ids, to find a particular group of counters, and then increment a subset of those counters in the group.

Data usage:

  • A group will always be queried for as a whole, never as a subset
    • ~200 counters per group
    • ~millions of groups
  • Only subsets of groups will be incremented, never an entire group at once
    • ~15 counters at a time

Operationally, that’s all I’ll be doing, and I’m wondering if anyone thinks Aerospike would be a good fit for this case.

The only caveat is that I’m short for RAM, so I can’t keep data completely in-memory.

Right off the bat, I’m wondering if a single-bin namespace with Integer's as the primary keys, and counter ids as secondary indexes would be the the most optimal organization of the data. Groups could be retrieved with the primary key, and counters can be referenced and incremented with their corresponding secondary key. Would this be the most appropriate organization of my data in Aerospike?


#2

This is a perfect setup for Aerospike. Each record has a primary key and contains bins, which are basically key/value pairs of their own.

The primary key would be the group id. Each counter in the group would be a separate bin with the counter id as the bin name and the count as the value. The drivers already have increment operations that work at the bin level and you can use the multi-operation feature to pipeline several increment commands for the same record.

This will let you increment by using the primary key and the bin name and then retrieve the entire group with just the primary key. SSD based storage should work fine for counter performance.

  • Don’t use single-bin namespaces and secondary indexes, that’s just making everything more complicated for no reason.

  • There is a unique bin name limitation across the namespace, you might have to account for this if you have lots of unique counter names.


#4

Does the number of bins in a namespace have an affect on the performance of queries in that namespace? – i.e. selecting the first bin in a namespace with 200 bins vs selecting the first bin in a namespace with 500 bins?

Also, it seems like a waste of space if I have a lot of bins sitting around with 0 as their counter value. What if I didn’t want a particular counter to exist unless it had been incremented already. Would using a bin for each counter still be an appropriate solution, or would a single Map type bin containing all of the id-counter kv pairs be a better solution?

Also, If I use Aerospike it may be used in an AWS environment where I don’t have access to an SSD, and only utilize ESB. How significantly would performance be affected in this case?


#5

You should read through the data model documentation for more details: http://www.aerospike.com/docs/architecture/data-model.html

Number of bins won’t affect performance on writes (since you would just be sending increment commands). On reads, if you’re selecting the record by primary key, then you get all the bins. You can also select only the bins you want returned as well.

Records are only created when you first write them, they also only contain the bins you write. If you increment 3 bins on a record, it will only contain 3 bins. There’s no schema that records follow, each one only contains that data you put in it. Millions of records are not really going to take up much space either.

Aerospike will work on AWS fine. There are instances with local SSDs available for the best performance, otherwise you should EBS volumes backed by SSD. Enhanced networking or provisioned I/O will give you better performance if you can’t use local SSDs.

How many writes per second are you doing? You just need to run a test and see if it’ll work for you. Note: If you have a lot of increments per key then set the transaction-pending-limit config variable to a high number. http://www.aerospike.com/docs/reference/configuration#transaction-pending-limit


#6

1000 group-incrementations/s. And the increments per key should always be lower than 20 per transaction. If at some point it were to go over 20 – the default value for transaction-pending-limit – would I have to increase the variable?


#7

Not sure what you’re quoting, but 1k/sec or 20k/sec are both trivial to handle.

For the config, might not matter but why not just increase it so you don’t have to worry about it? It’s 1 line in a config file.


#9

1000 group-incrementations/s = ~15k/s – given ~15 incrementations per group.

I’ll update the config file.

Thanks for the help