Apply SQL (Select query) on Aerospike JSON schema

Unni · March 1, 2023, 12:57am

Hello, I am looking at the best option to apply SQL query (translated to API format) on the JSON documents stored in an Aerospike bin. The SQL query is dynamic and input by users.

The fields with in the JSON document can be different per record and could grow to around 100 in number. Across all records, the total number of distinct fields (keys) would be in the order of 10Ks.

Example json document)

{ “country” : “USA”, “state” : “California”, “pincode” : “98765” }

A sample SQL query is as given below.

Select * from set where country = “USA” and state = “California” and pincode in (“94912”, “98723”).

Note that the query could use multiple fields from the json document in the where clause.

The set itself could contain millions of records and we are looking at an average response time of 5-10 ms from Aerospike for the select query execution, for around 15K RPS at peak.

A typical response would have around 400 records in the result set.

I understand that secondary indexes can be created on the bin (keys of the Map). We are using Java client and would like to translate the SQL query as it is (if possible) to the API requirements as needed. If we have any examples which solve the above cases in a performant fashion, please share the same. Appreciate if you could also share the storage format, SQL query translation details as well.

If direct query execution (from client machine) is not a viable option, would a UDF based option work for the above performance expectations…? Please let me know.

Thank you!

neelp · March 3, 2023, 6:11pm

Please look at this tutorial on implementing equivalent SQL select operations with the API.

You should create secondary indexes on specific map keys (i.e., json fields such as possibly city, postal code, and profession) that are 1) highly selective, that is, have a large number of distinct values, and 2) frequently used in queries.

In translating a SQL query, you should pick the most selective part of the where clause that is supported by a secondary index (if one exists) for optimal performance. The remaining where clause needs to be translated as a filter expression to be specified in the query policy. Note the IN clause as in your example above can be translated using a list expression or multiple OR’d expressions. Hope this helps.

Unni · March 5, 2023, 2:42am

Thank you very much @neelp for the quick response.

system · May 28, 2023, 2:43am

This topic was automatically closed 84 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
The best way to handle JSON object in Aerospike Data Types list , query , secondary , json , map	6	12660	September 22, 2015
Storing and retrieving JSON Data Feature Discussion list , query , secondary , map , index	2	6311	January 3, 2015
Java POJO to Aerospike Design Pattern Java Client query , secondary , aggregation , index	4	5780	December 18, 2014
How to Query using both Mapkeys and Mapvalues? Query & Indexing	1	2476	January 25, 2018
Is it recommended to use aerospike to filter on some field Query & Indexing query , secondary , java , index	1	1159	March 30, 2020

Apply SQL (Select query) on Aerospike JSON schema

Related topics