Bulk read and filtering capabilities


we are evaluating aerospike for a mobile advertising use-case, and have some questions around the ‘c’ client api’s.

  1. No real capability to get partial response. I want to be able to attach SLAs to these different batch invocations (a number of keys from different sets within a namespace), and run with whatever I could get within that time. This doesn’t seem to be possible. Every batch invocation has a single global timeout (which could be tweaked), and thus it is all or nothing. There seems to be no streaming support for responses, although invocations have callback to propagate responses

  2. The LUA capabilities seem to be highly restrictive. LUA modules can be specified when queries are invoked (aerospike_key_apply/as_query_apply etc.). However:

  • Queries have to select bins and a single where clause (we want to be able to say select some_bins from ns.set where rate <> 1 and status = 1)
  • The where clause predicate is limited - equality and range checks are the only ones supported (NOT ‘IN’ clauses, which are useful for us). For instance, select some_bins from ns.set where ids in (a, b, c…)
  • Keys can’t be specified with batch requests and LUA filter, and thus clients have no clue where to direct this request. Hence, it is fanned out to the cluster
  • Data types in predicates can only be strings or ints
  • LUA modules can be specified when single keys are queried However, making multiple invocations (for each key) and filtering just a single row response isn’t scalable

our use-case is something like this:

a. pass a set of entity id’s to the as cluster and return back a fraction of them that satisfies some condition (i.e. bin_1 == something && bin2 == some_other_thing)

b. using this set, fetch some more information from another set were bin_1 contains (something from a set that clients will pass) and bin_2 does not contain (something from a set that client will pass etc.)

we can fetch all this data and do the filtration on the client. however, the sheer volume of data that is coming back simply saturates network.


Your assessment is correct that currently batch read does not perform filtering nor UDF. The alternative is to make single-record reads, and UDF filtering can be done.