How well suited is Aerospike for this niche use case

the_beest · June 26, 2016, 3:26am

I’m looking for a data store that would be optimal for storing, incrementing, and retrieving only integers, and I’m wondering if Aerospike would be a fair solution. The logical structure of my data would be akin to a Map<Integer,Map<Integer,Integer>>. The data are groups of Integer counters. Each counter has it’s own Integer id that is unique within a group, and each group has its own unique Integer id which is completely unique. So I need to use Integer ids, to find a particular group of counters, and then increment a subset of those counters in the group.

Data usage:

A group will always be queried for as a whole, never as a subset
- ~200 counters per group
- ~millions of groups
Only subsets of groups will be incremented, never an entire group at once
- ~15 counters at a time

Operationally, that’s all I’ll be doing, and I’m wondering if anyone thinks Aerospike would be a good fit for this case.

The only caveat is that I’m short for RAM, so I can’t keep data completely in-memory.

Right off the bat, I’m wondering if a single-bin namespace with Integer’s as the primary keys, and counter ids as secondary indexes would be the the most optimal organization of the data. Groups could be retrieved with the primary key, and counters can be referenced and incremented with their corresponding secondary key. Would this be the most appropriate organization of my data in Aerospike?

manigandham · June 29, 2016, 4:54pm

This is a perfect setup for Aerospike. Each record has a primary key and contains bins, which are basically key/value pairs of their own.

The primary key would be the group id. Each counter in the group would be a separate bin with the counter id as the bin name and the count as the value. The drivers already have increment operations that work at the bin level and you can use the multi-operation feature to pipeline several increment commands for the same record.

This will let you increment by using the primary key and the bin name and then retrieve the entire group with just the primary key. SSD based storage should work fine for counter performance.

Don’t use single-bin namespaces and secondary indexes, that’s just making everything more complicated for no reason.
There is a unique bin name limitation across the namespace, you might have to account for this if you have lots of unique counter names.

the_beest · June 29, 2016, 6:59pm

Does the number of bins in a namespace have an affect on the performance of queries in that namespace? – i.e. selecting the first bin in a namespace with 200 bins vs selecting the first bin in a namespace with 500 bins?

Also, it seems like a waste of space if I have a lot of bins sitting around with 0 as their counter value. What if I didn’t want a particular counter to exist unless it had been incremented already. Would using a bin for each counter still be an appropriate solution, or would a single Map type bin containing all of the id-counter kv pairs be a better solution?

Also, If I use Aerospike it may be used in an AWS environment where I don’t have access to an SSD, and only utilize ESB. How significantly would performance be affected in this case?

manigandham · June 29, 2016, 7:25pm

You should read through the data model documentation for more details: http://www.aerospike.com/docs/architecture/data-model.html

Number of bins won’t affect performance on writes (since you would just be sending increment commands). On reads, if you’re selecting the record by primary key, then you get all the bins. You can also select only the bins you want returned as well.

Records are only created when you first write them, they also only contain the bins you write. If you increment 3 bins on a record, it will only contain 3 bins. There’s no schema that records follow, each one only contains that data you put in it. Millions of records are not really going to take up much space either.

Aerospike will work on AWS fine. There are instances with local SSDs available for the best performance, otherwise you should EBS volumes backed by SSD. Enhanced networking or provisioned I/O will give you better performance if you can’t use local SSDs.

How many writes per second are you doing? You just need to run a test and see if it’ll work for you. Note: If you have a lot of increments per key then set the transaction-pending-limit config variable to a high number. Configuration Reference | Aerospike Documentation

the_beest · June 29, 2016, 10:42pm

1000 group-incrementations/s. And the increments per key should always be lower than 20 per transaction. If at some point it were to go over 20 – the default value for transaction-pending-limit – would I have to increase the variable?

manigandham · June 29, 2016, 11:45pm

Not sure what you’re quoting, but 1k/sec or 20k/sec are both trivial to handle.

For the config, might not matter but why not just increase it so you don’t have to worry about it? It’s 1 line in a config file.

the_beest · June 30, 2016, 3:20pm

1000 group-incrementations/s = ~15k/s – given ~15 incrementations per group.

I’ll update the config file.

Thanks for the help

Topic		Replies	Views
Most efficient way to implement 2 counters in an Aerospike bin	3	326	November 25, 2023
Is my use case compatible with Aerospike usage? Use Cases	17	5746	July 14, 2019
Aerospike for frequent updates and 22TB+ storage Use Cases	3	109	October 23, 2024
I'm wondering if Aerospike can handle this database Use Cases	1	1908	September 11, 2015
Designing map behavior using set in aerospike	2	1200	May 4, 2016

How well suited is Aerospike for this niche use case

Related topics