UDF and secondary indexes


#1

Hi,

I have implemented UDFs to implement queries that require multiple predicate filters.

My question is - do I need to create secondary indexes on the bins that I want to filter upon? I know that if i don’t create a secondary index on these bins, the UDF search still succeeds. But will the fetch be faster if I create secondary indexes on these bins?

With Best Regards, Himanshu


#2

Hi. A query with no predicate turns into a scan. For a stream UDF to work fastest you want to shed as much unneeded data up front, meaning that you want to apply the predicate that will return the smallest subset of records in the set first. Those records are then passed to the UDF and can be filtered further.

In order for such a query predicate to work you’ll need a secondary index built on the bin which it operates over.


#3

Thanks for the quick reply.

Do you mean that if my UDF function uses (reads values) of 5 bins, I will have to create secondary indexes on all of those 5 bins?

Let me add specifics:

I find this text in Aerospike docs: Indexed MapReduce One of the main differences from other systems in that the aggrega+on is done against an index -­‐ essen+ally a WHERE clause. By filtering against an index performance can be very high.

My questions are -

  1. do I have to create secondary indexes on the bins used in the filter() function?
  2. do I have to create secondary indexes on the bins used in the map() function?

-himanshu