New to Aerospike from MongoDB and found hundreds times slower using streaming UDF than query with WHERE clause.
I’d like to do custom query on a set of ~2700 Book records (~30 bins) with OwnerID_Hex=“57120d7151c643ab42a8c19c” and IsUserDeleted=0. The set size in memory is less than 3 MB.
With AQL query syntax the it shows 13 rows in 0.002 secs, which is reasonable. Although IsUserDeleted is not assigned.
aql> SELECT Name, OwnerID_Hex FROM PV.Book WHERE OwnerID_Hex="57120d7151c643ab42a8c19c"
But with streaming UDF it returned in 0.863 secs, which is hundreds slower than query.
aql> AGGREGATE pv.getBooksByOwner() ON PV.Book
My streaming UDF:
function getBooksByOwner(stream)
local function fnFilter(rec)
if rec["OwnerID_Hex"] == "57120d7151c643ab42a8c19c" and rec["IsUserDeleted"] ~= 1 then
return true
end
return false
end
local function fnMap(rec)
local m = map()
m["Name"] = rec["Name"]
m["IsUserDeleted"] = rec["IsUserDeleted"]
return m
end
return stream : filter(fnFilter) : map(fnMap)
end
Is it a normal performance for a streaming UDF? Or do I have to use aggregation with WHERE clause to narrow down the number of records before sending into UDF?
aql> AGGREGATE pv.getBooksByOwner() ON PV.Book WHERE OwnerID_Hex="57120d7151c643ab42a8c19c"
(0.003 secs, reasonable)
env:
Run locally in Vagrant, MacPro 4G RAM
aerospike.conf:
mod-lua {
user-path /opt/aerospike/usr/udf/lua
cache-enabled true
}
namespace PV {
memory-size 1G
storage-engine memory
}
aql> show sets
{
"disable-eviction": "false",
"ns": "PV",
"set-enable-xdr": "use-default",
"objects": "2721",
"stop-writes-count": "0",
"set": "Book",
"memory_data_bytes": "2875841",
"truncate_lut": "0",
"tombstones": "0"
},
aql> show indexes
{
"ns": "PV",
"bin": "OwnerID_Hex",
"indextype": "NONE",
"set": "Book",
"state": "RW",
"indexname": "idxBookOwner",
"path": "OwnerID_Hex",
"type": "STRING"
},