Fastest way to count records returned by a query

query
aggregation

#1

Hi all,

We have just started looking at using Aerospike for some of our data storage and had a question that I can’t seem to find in the docs.

If I have a

SELECT * FROM user_profile.west WHERE last_activity BETWEEN 340 AND 345

How do I get the number of rows returned without actually getting the rows? Do I need to do a full aggregation query?

Thanks


#2

That’s an aggregation, so you’d have to do it with a stream UDF.


#3

Hey - This would be half impossible as the query is very dynamic and I won’t have the filters in the filter step.


#4

How would it be impossible? Let’s take an RDBMS for comparison - if the number of records in a query is changing rapidly, then the count is also changing with it. Each time you query it would be different. No difference there. A COUNT() aggregation function is executed server-side as well.

In the case of the secondary index we don’t yet have native aggregation functions, so you need to implement it via a stream UDF. For your case you don’t need a filter at all. You give it the BETWEEN predicate, and the records matched by it in the secondary index stream through the UDF. All you need to do is have a simple mapper that returns 1, and a reducer to further sum it up.

I have a set sp with a bin i, and a secondary index over it.

CREATE INDEX test_sp_i_idx ON test.sp(i) NUMERIC

I inserted 100 records with consecutive values for i.

The stream UDF is in a module named aggr.lua

local function counter(record)
  return 1
end

local function sum(v1, v2)
  if type(v1) == 'number' and type(v2) == 'number' then
    return v1 + v2
  else
    return 0
  end
end

function count(stream, password)
  return stream : map(counter) : reduce(sum)
end

I’ll use a simple Python script to call it

from __future__ import print_function
import aerospike
from aerospike import predicates as p

client = aerospike.client({ 'hosts': [('192.168.119.3', 3000)]}).connect()
#client.udf_put('aggr.lua')

s = client.query('test','sp')
s.where(p.between('i', 1, 50))
s.apply('aggr', 'count', [])
r = s.results()
print(r)

The result is

[50]

#5

There’s also an example of GROUP BY [HAVING] in the Python client: https://github.com/aerospike/aerospike-client-python/blob/master/examples/client/stream_example.lua


#6

Wow - I misunderstood and apparently its possible and quite easy to write as well!

Thank you.!