Sorting a Very Large set repeatedly


#1

Hi,

I’m developing an application that might potentially store billions of records in a single set on our Aerospike cluster.

A very common use case will be fetching the top n most popular records from the DB.(We can expect such calls almost every second).

From what I understand, the only way to achieve this will be using some UDF that sorts by popularity and limits the result.

So, my question is what will be the impact on performance? Might this cause Aerospike to hang when sorting so many records repeatedly every few seconds? If yes, is there a better alternative available?

Thanks.


#2

These top n records, how many different categories of top records do you need? I’m sure you saw this already http://www.aerospike.com/blog/top-10-aerospike-aggregations/ but wondering if maybe you might be able to just run this and store the results back into a map record. Then, if a user wants to get the current top results, they can retrieve the map record which is updated periodically. This way the consumers of the top results don’t have to all run the aggregation, only 1 program does.