UDF performance



 We're evaluating Aerospike as one of the in-memory NoSQL solutions. The primary goal is to have very low latency and very high ingestion rate. 

   .  Aerospike 3.5.x server installed on a single machine
   . CentOS, 20 cores (in HT mode) , so 40 virtual cores
   . 125Gb memory.
   . This is a dedicated "bare metal" machine for Aerospike.

We ingested 2 million records on this instance (each record has about 7 bins, mostly integer and a couple of string type)
 We're using the count UDF described on your website to get a row count. Executing a query like "aggregate count() on <ns>.<set> takes 1.4 secs as measured using aql. While this query is executing, top shows barely 3/4 cores are used out of 40 with utilization around 45%. The rest are idle. Also, most of the memory on this box is also available for use.

The question is - What is preventing us from getting a millsecond response time for this miniscule data when CPU/memory are obviously not a limiting factors? Do we need to tweak any other configuration parameter? 

These results were captured after executing "afterburner.sh" script.


Not sure if the following applies but I remember to have seen a lua instance pool size set to about 5 instances. Not sure if it was client- or serverside configuration. Anyways, if you want only counters I would not use UDFs. I had a similiar usecase where I needed to get a count on something and ended up implementing one in application logic using only AS kvs-functionality (incrementing a counter record on inserts, decrementing on deletes) which provides me with sub-millisecond response no matter what ( o(log(1) ). Aggregation counts will become slower and slower with every new entry ( o(log(n) ) and I think you shouldn’t consider any stream UDF as ‘realtime’ as they may take a few secs or even longer.


Unrelated to UDFs, make sure your transaction-queues and service-threads are configured to the number of cores as described in the configuration reference.

As Manuel described, UDFs are added functionality to the key-value operations, using Lua scripts. The server includes LuaJIT which should provide better performance for repeated calls, but still not at the same order of speed as the core key-value operations. If you try to run the same command through AQL several times in a row does the execution time improve?


Running the same command multiple times in AQL doesn’t help. We’ll configure transaction-queues and service-threads as suggested and keep you posted.

Thanks for your suggestion and comments.



The transaction-queues and service-threads parameters were already configured to the number of cores. It still uses only 4 cores for some reason.