Massive leaderboards usecase

ayyappa99 · June 17, 2020, 2:30am

We have an use-case where we need to solve the following

Find rank of a player in real time (In order of 100ms is fine)
Store more than 10 million scores per leaderboard
Cost efficient

Currently we are researching Aerospike, ScyllaDB an Couchbase.

All benchmarks available for Aerospike are pretty old so we are a bit confused about latest improvements.

We have gone through default sorted maps in Aerospike but unfortunately looks like its limited by write-block-size which makes it unusable.

It would be great if someone shed some light no this scenario.

Thanks, Ayyappa

pgupta · June 19, 2020, 3:21am

So let me understand the problem.
You have 10 million players.  Each has a score. 
You want to find the rank of player X based on his score.
i.e. Player X = you rank at 7,860,000 in a total of 10 million players
Is that right?

ayyappa99 · June 19, 2020, 7:29am

Right! Thanks for putting it in simple lines.

pgupta · June 19, 2020, 4:35pm

This can be modeled but within certain assumptions.

1 - your score has that granularity that all 10 million users can have a unique score [if not, it can be modeled but that changes the unique rank requirement]

2 - multiple clients (application) are updating player scores - so a player score at any given read is only approximate. [You mentioned 100 ms, that is a fair window and is do-able. As you noted, we cannot have the entire data of 10 million users:score as a single record and ensure atomicity of updates vs reads.]

If those assumptions are correct, then we can think of modeling this.

If your score really does not have that level of granularity that we can assign each player a unique rank, then you need to think of an alternate ranking system - like by some sort of ranges?

ayyappa99 · June 20, 2020, 1:10pm

1 - your score has that granularity that all 10 million users can have a unique score [if not, it can be modeled but that changes the unique rank requirement] The score values might overlap but we can rank them unique based on secondary factor (like submission timestamp)

2 - multiple clients (application) are updating player scores - so a player score at any given read is only approximate. [You mentioned 100 ms, that is a fair window and is do-able. As you noted, we cannot have the entire data of 10 million users:score as a single record and ensure atomicity of updates vs reads.] Each unique player submits scores(from different applications in parallel) to his own record and score reading with a delay is fine.

Currently what we are doubtful is

In the worst case, Is it fair to read all the records (10 million) at once to calculate the final ranking?
Are the writes queued so that they will be written to the db for “sure”?

Thanks, Ayyappa

pgupta · June 20, 2020, 4:00pm

So there is a second level of ordering complexity - submission timestamp. This is an interesting data modeling problem. I think the following model will work:

Maintain a record for retrieving rank which has the following key:value map bin - key is list of [score, timestamp], value is userid. (Java client will work, other clients check if map key as a list is supported.)

rank_rec_0: { ...., [score, timestamp]:userid, .... }

For argument sake, lets assume all your users fit into one such record. Then user rank will be get_by_value(userid) and return value type - INDEX.

In your case, all users will not fit into one record.

If you have good assessment of rank vs user count distribution and this is a bounded problem in terms of number of users, you can break the records into multiple records.

rank_rec_0: {....}
rank_rec_1: {....}
rank_rec_2: {....}
rank_rec_3: {....}

Based on score, if a userid must be entered in rank_rec_2, then its rank will be: map_size of rank_rec_0 + map_size of rank_rec_1 + index in rank_rec_2. [Now that may be the reverse rank, but reverse can also be computed likewise]. So you will read multiple records but that should all be possible within the 100 ms window. Typical reads in Aersopike are in 1 ms order.

If such bounded, simplistic segregation of rank_rec by score is not possible, then you have to come with a more creative - self-sharding map implementation in Aerospike.

You can watch this recording: Summit '19: Key Data Modeling Techniques | Aerospike at around 32:00 minutes, Adaptive Map for a different problem is discussed. Some ideas from there can be re-purposed for this problem.

Or if you plan to be an Aerospike Enterprise Customer in the future, Aerospike’s Solution Architecture team may be able to help.

Topic		Replies	Views
Creating a leaderboard with getting rank of a particular record User Defined Functions (UDF)	4	2076	February 9, 2018
Sorting a Very Large set repeatedly How Aerospike Works	1	1681	June 18, 2017
Ranking using score and pagination support in Aerospike Feature Discussion	6	1606	March 4, 2022
How would one implement a sorted list like Redis Node.js Client	6	4215	March 13, 2015
Bucketing / splitting data	3	1821	October 27, 2021

Massive leaderboards usecase

Related topics