Secondary index with low cardinality or huge fanout

We are modelling user wishlist on Aerospike (just for an example to explain my usecase). I will have records like userID β†’ productid in aerospike.

whenever a product comes in stock, we need to send notification to all the users who have wishlisted a product. To cater to this use case we are planning to have a secondary index on productid. but there can be certain products that are very hot and have millions of users wishing for it.

Are there any considerations we have to do? or just creating a secondary index will work? the key metrics here is time taken in fetching all the userIds from AS.

This is more of a data modelling problem, but i did not find any resource which talks about this.

A secondary index with low selectivity is generally not recommended. With low selectivity, a secondary index query can still be faster than the alternative of a full scan, however, the cost of additional memory and index updates need to be considered.

Also, a query returning millions of results will be slower even if selectivity is relatively high, for example, if the total number of users are in the hundreds of millions.

Some factors to consider include:

  • unique products on wish lists at a time
  • are hot products the norm or most products have a small number of interested people
  • the fraction of the total number of users choosing a hot product
  • max size limit on a user’s wish list

If most products are not hot and most queries do not involve hot products, a secondary index makes sense. For scenarios that are not clear cut, it is best to try it out in a real or representative environment. Hope this helps.

1 Like