Aerospike secondary index availability

helper_bro · February 18, 2023, 4:38pm

Hi Team,

I want to understand secondary index little bit better, i have already read this and this and this and this but still have some questions.

if let’s say replication factor of data is 3, in a 5 node cluster. In steady state of cluster, there will be 3 nodes owning on partition (One primary and 2 secondary i guess?) In this case when a client does a query on secondary index, the query is forwarded to all the nodes (unlike in the case of primary index),

a. will the nodes who are primary owner of data only be building secondary index or one node will build secondary index for all the data owned by it?

b. If it is former, while forwarding the request to other nodes, is it told to node which partition to return data for? or one node will only return data for which it is the primary owner of partition? can secondary owner of a partition serve secondary index queries? In case partition is unassigned from a node the gc thread will take care of secondary indexes created on previously owned data?

c. if it is later, whenever partition rebalance happen indexes for a partition would be unavailable for a brief period of time?

d. In case partition is going on how would it change? It is mentioned in the above linked docs that some data may not be returned or may be returned twice, is it possible to get exception in these scenarios rather than getting what is available? We have a use case where if data is present we want secondary index to return it otherwise fail the request, is it possible to do query like this?

e. In what other scenario, secondary index will say a record is not there but record is actually present? We had an outage because of this where secondary index queries were not returning records. One node was unhealthy during that period is all we know for sure. we were on AS 4.x version, Not sure if bug mentioned here impacted us or not.

meher · February 28, 2023, 2:07am

In recent client/server, both primary / secondary index queries are sent to nodes on a ‘per partition’ basis.

a. all replicas (master or not) will build the secondary index so that they can ‘take over’ if the master goes down.

b. for basic queries (meaning not background / aggregation) yes, the client will request specific partitions from the nodes who claimed master ownership. But a node that is not master but has all the data for a partition will honor the query and respond to it with the data it has. In terms of GC, yes, when a partition is dropped, the GC will go through to clean up the secondary index for that partition but this may change and be optimized to directly drop the secondary index for that partition (as the secondary index is also split by partition).

c. all replicas do have the secondary index and keep it up to date.

d. this is not the case as of Aerospike version 6. Since client will request on a ‘per partition basis’, if a partition is missing because its ownership changed when the query was ‘in flight’ and had hit nodes right at the bad time, the client will chase it down by retrying that specific partition against the node that would now own it. Yes, for older versions, there is a fail on cluster change policy… but again, better to use more recent version which will not miss any partition. (see this old KB article: Aerospike Customer)

e. if using the Enterprise Edition, best would be to open a Support case but that version is out of the maintenance/support window, so best may be to look at upgrading.

helper_bro · February 28, 2023, 12:53pm

Thanks @meher for detailed response. I do understand things are fixed in As version 6.x. Also can we assume secondary index response by default are consistent in AS 6.x? can you please help us understand what to expect from AS version prior to 6.x? specifically AS 4.2?

meher · February 28, 2023, 5:56pm

In 6.X, queries will not miss any partitions due to cluster changes. In earlier versions, the failOnClusterChange policy can be used to get an error back instead of a query potentially missing partitions during cluster change (migrations when adding/removing nodes in a cluster).

system · February 28, 2024, 5:56pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Not all records are returned on secondary index query during migration Query & Indexing query , secondary , migration	2	1971	May 7, 2018
Behavior when using secondary indexes for reads Query & Indexing secondary , index	3	2051	June 24, 2015
Secondary index missing records after migration (add/remove node) Query & Indexing secondary , index	3	1407	March 24, 2017
Query by secondary index performance Query & Indexing secondary , index	4	5622	August 5, 2015
Secondary Index Creation Time Operations secondary , index	1	1116	December 13, 2018

Aerospike secondary index availability

Related topics