Performance Woes: is there documentation on data modeling in correlation to querying methods?


#1

Okay, so looking into a lot of benchmarks online about the performance of Aerospike are great. One thing I’m seeing is that all of the ones I’ve seen thus far are simple key/value benchmarks.

Are there any known write-ups/documentation on data modelling in correlation to querying methods to achieve the best results?

Using vagrant (compared to MongoDB vagrant), one thing I’ve noticed is that querying for all items in a set is a lot slower than returning a cursor to scan through a collection.

Is this expected? Should users of Aerospike try to avoid using sets to iterate through?

Examples: INSERT INTO users.profile (PK, firstName) VALUES (‘a@a.com’, ‘A’) INSERT INTO users.profile (PK, firstName) VALUES (‘b@b.com’, ‘B’) INSERT INTO users.profile (PK, firstName) VALUES (‘c@c.com’, ‘C’)

To get all of the users, I’d expect to just query users.profile, but this seems a lot slower than doing a collection query in mongo (so far on a single vagrant server).

Should I instead create “views” such as users.lists which contain “paginated” style results, grab them, then issue a batch GET on all the results returned?

I’m seeing over 500ms on “SELECT * FROM users.profile” as apposed to a near 0ms “WHERE PK = ‘…’”

I know that in RTB platforms, a lot of it is key/value, but I’m trying to go beyond key/value (think blog/CMS/a normal site :P)

Should I stick with MongoDB or others?


#2

By design, Aerospike’s scan functionality (equivalent of the above “query for all items in set”) is done for use cases such as database cleanup, thus progresses in a throttled fashion to minimize any possible performance degradation on primary key or secondary key queries.

It is possible to increase the parallelism of scan jobs by setting a “SCAN_JOB_HIGH_PRIORITY” instead of the default “NORMAL_PRIORITY”.

It is also possible to tweak the community build and increase the threading parallelism by changing the number of threads used for the scan job:

https://github.com/aerospike/aerospike-server/blob/master/as/src/base/thr_tscan.c

 job->n_threads          = 3;
 // can be changed to MAX_SCAN_THREADS

Having said that, we have seen folks starting to use scan in ways different than we originally thought. We will certainly be taking these into consideration.