Apply SQL (Select query) on Aerospike JSON schema

Hello, I am looking at the best option to apply SQL query (translated to API format) on the JSON documents stored in an Aerospike bin. The SQL query is dynamic and input by users.

The fields with in the JSON document can be different per record and could grow to around 100 in number. Across all records, the total number of distinct fields (keys) would be in the order of 10Ks.

Example json document)

{ “country” : “USA”, “state” : “California”, “pincode” : “98765” }

A sample SQL query is as given below.

Select * from set where country = “USA” and state = “California” and pincode in (“94912”, “98723”).

Note that the query could use multiple fields from the json document in the where clause.

The set itself could contain millions of records and we are looking at an average response time of 5-10 ms from Aerospike for the select query execution, for around 15K RPS at peak.

A typical response would have around 400 records in the result set.

I understand that secondary indexes can be created on the bin (keys of the Map). We are using Java client and would like to translate the SQL query as it is (if possible) to the API requirements as needed. If we have any examples which solve the above cases in a performant fashion, please share the same. Appreciate if you could also share the storage format, SQL query translation details as well.

If direct query execution (from client machine) is not a viable option, would a UDF based option work for the above performance expectations…? Please let me know.

Thank you!

Please look at this tutorial on implementing equivalent SQL select operations with the API.

You should create secondary indexes on specific map keys (i.e., json fields such as possibly city, postal code, and profession) that are 1) highly selective, that is, have a large number of distinct values, and 2) frequently used in queries.

In translating a SQL query, you should pick the most selective part of the where clause that is supported by a secondary index (if one exists) for optimal performance. The remaining where clause needs to be translated as a filter expression to be specified in the query policy. Note the IN clause as in your example above can be translated using a list expression or multiple OR’d expressions. Hope this helps.

Thank you very much @neelp for the quick response.

This topic was automatically closed 84 days after the last reply. New replies are no longer allowed.