Massive leaderboards usecase

We have an use-case where we need to solve the following

  1. Find rank of a player in real time (In order of 100ms is fine)
  2. Store more than 10 million scores per leaderboard
  3. Cost efficient

Currently we are researching Aerospike, ScyllaDB an Couchbase.

All benchmarks available for Aerospike are pretty old so we are a bit confused about latest improvements.

We have gone through default sorted maps in Aerospike but unfortunately looks like its limited by write-block-size which makes it unusable.

It would be great if someone shed some light no this scenario.

Thanks, Ayyappa

So let me understand the problem.
You have 10 million players.  Each has a score. 
You want to find the rank of player X based on his score.
i.e. Player X = you rank at 7,860,000 in a total of 10 million players
Is that right?

Right! Thanks for putting it in simple lines.

This can be modeled but within certain assumptions.

1 - your score has that granularity that all 10 million users can have a unique score [if not, it can be modeled but that changes the unique rank requirement]

2 - multiple clients (application) are updating player scores - so a player score at any given read is only approximate. [You mentioned 100 ms, that is a fair window and is do-able. As you noted, we cannot have the entire data of 10 million users:score as a single record and ensure atomicity of updates vs reads.]

If those assumptions are correct, then we can think of modeling this.

If your score really does not have that level of granularity that we can assign each player a unique rank, then you need to think of an alternate ranking system - like by some sort of ranges?

1 - your score has that granularity that all 10 million users can have a unique score [if not, it can be modeled but that changes the unique rank requirement] The score values might overlap but we can rank them unique based on secondary factor (like submission timestamp)

2 - multiple clients (application) are updating player scores - so a player score at any given read is only approximate. [You mentioned 100 ms, that is a fair window and is do-able. As you noted, we cannot have the entire data of 10 million users:score as a single record and ensure atomicity of updates vs reads.] Each unique player submits scores(from different applications in parallel) to his own record and score reading with a delay is fine.

Currently what we are doubtful is

  1. In the worst case, Is it fair to read all the records (10 million) at once to calculate the final ranking?
  2. Are the writes queued so that they will be written to the db for “sure”?

Thanks, Ayyappa

So there is a second level of ordering complexity - submission timestamp. This is an interesting data modeling problem. I think the following model will work:

Maintain a record for retrieving rank which has the following key:value map bin - key is list of [score, timestamp], value is userid. (Java client will work, other clients check if map key as a list is supported.)

rank_rec_0: { ...., [score, timestamp]:userid, .... }

For argument sake, lets assume all your users fit into one such record. Then user rank will be get_by_value(userid) and return value type - INDEX.

In your case, all users will not fit into one record.

If you have good assessment of rank vs user count distribution and this is a bounded problem in terms of number of users, you can break the records into multiple records.

rank_rec_0: {....}
rank_rec_1: {....}
rank_rec_2: {....}
rank_rec_3: {....}

Based on score, if a userid must be entered in rank_rec_2, then its rank will be: map_size of rank_rec_0 + map_size of rank_rec_1 + index in rank_rec_2. [Now that may be the reverse rank, but reverse can also be computed likewise]. So you will read multiple records but that should all be possible within the 100 ms window. Typical reads in Aersopike are in 1 ms order.

If such bounded, simplistic segregation of rank_rec by score is not possible, then you have to come with a more creative - self-sharding map implementation in Aerospike.

You can watch this recording: Summit '19: Key Data Modeling Techniques | Aerospike at around 32:00 minutes, Adaptive Map for a different problem is discussed. Some ideas from there can be re-purposed for this problem.

Or if you plan to be an Aerospike Enterprise Customer in the future, Aerospike’s Solution Architecture team may be able to help.

1 Like