Hi,
I have implemented UDFs to implement queries that require multiple predicate filters.
My question is - do I need to create secondary indexes on the bins that I want to filter upon?
I know that if i don’t create a secondary index on these bins, the UDF search still succeeds.
But will the fetch be faster if I create secondary indexes on these bins?
With Best Regards,
Himanshu
Hi. A query with no predicate turns into a scan. For a stream UDF to work fastest you want to shed as much unneeded data up front, meaning that you want to apply the predicate that will return the smallest subset of records in the set first. Those records are then passed to the UDF and can be filtered further.
In order for such a query predicate to work you’ll need a secondary index built on the bin which it operates over.
Thanks for the quick reply.
Do you mean that if my UDF function uses (reads values) of 5 bins, I will have to create secondary indexes on all of those 5 bins?
Let me add specifics:
I find this text in Aerospike docs:
Indexed
MapReduce
One
of
the
main
differences
from
other
systems
in
that
the
aggrega+on
is
done
against
an
index
-‐
essen+ally
a
WHERE
clause.
By
filtering
against
an
index
performance
can
be
very
high.
My questions are -
- do I have to create secondary indexes on the bins used in the filter() function?
- do I have to create secondary indexes on the bins used in the map() function?
-himanshu