Write throughput on single key


#1

Howdy,

For anyone who is using Aerospike to implement counters, what kind of write throughput can I expect to get on a single key. I have an upcoming use case where a single key whose value has a bin I will increment a few thousand times per second.

I know that there is an increment verb I can use to increase the value without doing lots of extra round trips. I found this discussion which goes over

  1. What happens when contention gets too high on the key.
  2. How to break the load across multiple keys, then aggregate results.

I’m still curious what kind of load you guys have been able to achieve on a single key. Also, i’d like to hear if anyone is using the following techniques to deal with this case:

  • data-in-index
  • in-memory namespace

#2

Throughput is not merely a function of the data model but really dependent on your hardware, RAM, cluster size, type of SSDs (if persisting data), client to server network bandwidth etc.

For counters, you will get the best throughput with data-in-index (integer data allowed) with data-in-memory (if persisting data in device). If you can logically separate the clients on some qualification and use a composite key thereby separating counters into multiple records and aggregating when needed, you can improve the throughput. If doing that, another trick that may help is seeing what partition each composite key is getting assigned to after the RIPEMD160 hash. Play with the composite key to put each of these counter records on separate partitions. You can use the explain command in AQL to see the partition number computed for each set:compositeKey combo.

In parallel, use Java benchmark tool to characterize your system. You should get fairly representative throughput numbers by testing with it. Start with a simplest setup (single node, client on node) and gradually add the remaining elements to identify the biggest impact element.


#3

An easy way to do an aggregation, would be to do a batch get of the composite keys. Say you want this key to be “key”. To split the load up evenly, you could split this into key[1-100] - effectively splitting the load across 100 keys. Then, when you go to retrieve the keys, simply construct an array of keys to get - and perform a batch get.

ex… Key[] keys = new Key[100];

for (int i = 0; i < 100; i++) { keys[i] = new Key(“test”, “myset”, “mybin”); }

Record[] records = client.Get(policy, keys);

Then you can loop through the records… int sum=0; for (int i = 0; i < 100; i++) { if(records[i] != null){ sum+=records[i].getInt(“mybin”); } }

The UDF way may actually be faster though… I’m not sure if you can use batch get with an aggregation, or if you’d have to use a statement. Of course - you’d want to experiment and see what yields best results. http://www.aerospike.com/docs/client/java/usage/aggregate

It may look something like this (@pgupta will correct me if i am wrong… hopefully.) local function one(rec) return rec[mybin] end

local function add(a, b) return a + b end

function sum(stream) return stream : map(one) : reduce(add); end

and invocation from the program would look something like this: Statement stmt = new Statement(); stmt.setnamespace=“foo”; stmt.setSetname=“bar”; stmt.setbin=“mybin”; stmt.setFilters(Filter.range(“mybin”, Long.MIN_VALUE, Long.MAX_VALUE));

ResultSet rs = client.QueryAggregate(null, stmt, “myudf”, “sum”, Value.Get(“mybin”)); if (rs != null && rs.Next()) { Console.WriteLine("Sum = " + rs.Object); }


#4

If number of composite keys is small, batch read and aggregate final result in client is better.
The stream UDF is better if you have a very large number of records to aggregate.

Just looking at the stream UDF, the implementation needs an aggregation function between map and reduce. Don’t think what you have will work if there are multiple records on the same node, which quite likely there will be. Also, …rec[‘mybin’] … mybin needs quotes. Again, test and validate.

Array of keys have to be constructed using the composite keys.

“mybin” ==> should be each composite key for the records that contain individual counters. eg: if primary keys of counter records is say: “c:1”, “c:2”, “c:3” …etc key[1] = new Key(“test”, “myset”, “c:1”) etc, generate c:1, c:2 … etc programmatically using string concatenation (language specific).

If you create custom composite keys to intentionally spread them over different partitions, then build the key array as you create the composite keys and use it later.


#5

Going off topic, slightly, but can you do a stream UDF and pass an array of keys?


#6

No. Cannot run stream udf on a batch of keys. Likewise, ver 3.12 introduced predicate filtering and you cannot do that on a batch of keys either. Think of predicate filtering as Aerospike writing a stream udf with filters on the fly for you. However they introduced regular expressions based filtering in predicate filtering and there may be a clever opportunity to filter based on the stem of composite key if you store the key in a bin too.