How to manage high selectivity secondary indexes effectively within Aerospike

Aerospike_Knowledge · March 3, 2016, 8:15pm

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

FAQ How to manage high selectivity secondary indexes effectively within Aerospike

Background

Secondary indexes use a scatter/gather design to retrieve data meaning that queries will run on every node in the cluster. When the secondary index is highly selective (or returns very few records), this results in unecessary work, as the query runs on all nodes but the requested record will sit on one or a few nodes only.

Suggested Approach

It is suggested to create a reverse mapping between the secondary key and the primary key(s) within the same namespace such that rather than querying with the scatter/gather approach, the reverse mapping is used to do a simple key/value or a small batch transaction. This is illustrated as follows:

Main data set containing records

Records looks as follows A, B, C (primary keys) with comma separated bins and secondary index on the first bin:

A–> b,c,d,f B–> f,g,k,l C–> b,k,f,l

Lookup set containing a mapping between the unique secondary index bin and the primary keys

b–>A,C f–>B

The index query is then broken into two steps.

Step 1:

The secondary index query is sent to lookup set, which executes as a key/value store query and returns A and C, the primary keys for the main data set.

Step 2:

The primary keys are sent to the main data set which executes a subsequent batch transaction and returns the required records from the nodes where they reside rather than all nodes.

Points to consider

The application has to be coded to issue two index calls, this is not handled implicitly.
There is no link between the lookup set and the main data set within the database therefore the application is responsible for maintaining consistency

Topic		Replies	Views
How Aerospike secondary index query works?	0	713	November 30, 2018
Is it possible to use primary key along with secondary key in Aerospike for fetching a record? Query & Indexing primarykey , secondary , record , index	7	1579	March 30, 2021
Secondary index with low cardinality or huge fanout	2	329	March 26, 2024
Primary key iterator and seek functions sorely missed secondary , index	5	1726	August 29, 2017
Geospatial Feature Discussion	2	1499	April 27, 2018