The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.
#Synopsis:
Aerospike does not provide a function to count the number of records in a bin.
You can write a simple lua module, and call the module in a client API to count the records in a bin.
For this exercise, we use the following data set:
aql> select * from DataStore
+------------+-------+---------+
| frequency | foo | recency |
+------------+-------+---------+
| “eleven” | | |
| | “nnn” | |
| “thirteen” | | |
| “four” | | “ddd” |
| “ten” | | “jjj” |
| “three” | | “ccc” |
| “five” | | “eee” |
| “two” | | “bbb” |
| | | “mmm” |
| | | “lll” |
| “one” | | “aaa” |
| “seven” | | “ggg” |
| “nine” | | “iii” |
| “twelve” | | |
| “eight” | | “hhh” |
| “six” | | “fff” |
+------------+-------+---------+
16 rows in set (0.460 secs)
We want to count the number of values in the ‘frequency’ bin.
Create a secondary index on the ‘frequency’ bin:
aql> show indexes
+-------------+-------------+-----------+----------+-------+-----------+------------+--------+
| ns | bins | set | num_bins | state | indexname | sync_state | type |
+-------------+-------------+-----------+----------+-------+-----------+------------+--------+
| “DataStore” | “frequency” | “Profile” | 1 | “RW” | “freq_1” | “synced” | “TEXT” |
+-------------+-------------+-----------+----------+-------+-----------+------------+--------+
1 row in set (0.000 secs)
You could query the secondary index directly, but that solution is only practical for a very small set of data:
aql> select frequency from DataStore.Profile
+------------+
| frequency |
+------------+
| “eleven” |
| “thirteen” |
| “four” |
| “ten” |
| “three” |
| “five” |
| “two” |
| “seven” |
| “one” |
| “nine” |
| “twelve” |
| “eight” |
| “six” |
+------------+
13 rows in set (0.464 secs)
The solution is to write a simple lua module that performs the aggregation on the bin. Save the following lua code in a file named countNumOfBinsWithValues.lua
. The lua code performs the aggregation:
local function accumulate(counter,rec)
-- Initialze the counter; First time around
if counter == nil then
counter = 0
end
-- Check if 'frequency' bin is not nil (=has some value) -- if so, increment the counter
if rec.frequency ~= nil then
counter = counter + 1
end
return counter
end
local function merge(a,b)
a = a + b
info(tostring(a))
return a
end
function count(stream)
-- Process incoming tuples and pass it to aggregate function, then to reduce function
-- NOTE: aggregate function 'accumulate' accepts two parameters:
-- 1) counter
-- 2) function name 'accumulate' -- which will be called for each record as it flows in
-- Return reduced value of the counter generated by reduce function merge
return stream : aggregate(counter,accumulate) : reduce(merge)
end
Use the following syntax to import the module:
$ aql -c “register package ‘./countNumOfBinsWithValues.lua’”
The client tools do not support calling lua modules, so you must call the module from one of the API clients. For this example, we use a Python application.
Save the following code in a file called pycli.py
:
import the module
from future import print_function
import aerospike
from aerospike import predicates as p
config = {‘hosts’: [ (‘127.0.0.1’, 3000) ]}
try:
client = aerospike.client(config).connect()
except:
print(“failed to connect to the cluster with”. config[‘hosts’])
sys.exit(1)
#Query namespace DataStore, set Profile:
query = client.query( ‘DataStore’, ‘Profile’)
#Query bin frequency:
query.select( ‘frequency’ )
query.apply(‘countNumOfBinsWithValues’, ‘count’)
# callback function prints the records as they are read
def print_result(value):
print(value)
#Execute the query and call print_result for each result
query.foreach(print_result)
#Close the connection to the Aerospike cluster
client.close()
The pycli.py application returns the proper 13 records from the ‘frequency’ bin: $ python pycli.py 13