I have a task which makes me to work on 2 sets from stream UDF.
More precisely I want to filter data in set 1 based on data in set 2 (server side join).
Now aerospike Lua object has exists, create, update functions but doesn’t support get()/set() operations,
So far one option I found is to return data from set 1 to the client
then make a batch get
and after that filter data.
Client side filtering will be too heavy as records are quite big to transfer all of them.
Second option is to merge both sets into one which is bad as I won’t be able to keep secondary indexes for quick aggregates.
Currently Aerospike does not support getting other records from UDF.
In a distributed system, the second record sought is likely on a remote node, incurring extra network hop.
Is all the data from set1 needed in order to filter data in set2? Or some statistical information is generated before passing on to filtering set2?
Not sure what is meant by “merge both sets into one”. Is it a solution where the data in set1 are represented as additional bins of records in set2? It is possible to have different indexes to be created on different bins in the set.
I understand that server side join is not something that NoSQL databases are intended for.
However it would be a very nice feature as Aerospike provides SQL like interface and promotes some OLAP use cases.
My final solution was to denormalize set1 by joining (merging) data from set2. I’ve build a batch job outside of Aerospike to execute a join and put consolidated data into single Aerospike set.
After that stream UDF has all the data in the same set and can filter it properly. A downside is that I had to put joined data into bin of map type and maps cannot be indexed unfortunately. So this solution is complex and not as fast.