Fastest way to count records returned by a query

rbotzer · January 12, 2016, 6:42pm

How would it be impossible? Let’s take an RDBMS for comparison - if the number of records in a query is changing rapidly, then the count is also changing with it. Each time you query it would be different. No difference there. A COUNT() aggregation function is executed server-side as well.

In the case of the secondary index we don’t yet have native aggregation functions, so you need to implement it via a stream UDF. For your case you don’t need a filter at all. You give it the BETWEEN predicate, and the records matched by it in the secondary index stream through the UDF. All you need to do is have a simple mapper that returns 1, and a reducer to further sum it up.

I have a set sp with a bin i, and a secondary index over it.

CREATE INDEX test_sp_i_idx ON test.sp(i) NUMERIC

I inserted 100 records with consecutive values for i.

The stream UDF is in a module named aggr.lua

local function counter(record)
  return 1
end

local function sum(v1, v2)
  if type(v1) == 'number' and type(v2) == 'number' then
    return v1 + v2
  else
    return 0
  end
end

function count(stream, password)
  return stream : map(counter) : reduce(sum)
end

I’ll use a simple Python script to call it

from __future__ import print_function
import aerospike
from aerospike import predicates as p

client = aerospike.client({ 'hosts': [('192.168.119.3', 3000)]}).connect()
#client.udf_put('aggr.lua')

s = client.query('test','sp')
s.where(p.between('i', 1, 50))
s.apply('aggr', 'count', [])
r = s.results()
print(r)

The result is

[50]

Topic		Replies	Views
Counting distinct (unique) Query & Indexing query , secondary , udf , aggregation , index	4	6199	August 16, 2014
Slower than expected performance on aggregations Tuning	6	3154	June 10, 2015
How to do Sum distinct Java Client	2	1584	February 8, 2017
Aerospike aggregation problem Ruby Client	1	1639	November 24, 2015
I have a Lua script which is designed to filter records based on the value of the "age" bin. The filter_by_age function is used to filter the stream of records, and the count_records function is used to count the number of records that meet the age condit User Defined Functions (UDF) udf	3	299	July 26, 2023

Fastest way to count records returned by a query

Related topics