Aql select command encounters timeout issue when another scan job for different set is running


#1

I meet this issue on Aerospike community version 3.8.3.

Aql select command encounters timeout issue when another scan job which scanning another set is running

issue : aql> select userId from test.dealer Error: (9) Timeout: timeout=1000 iterations=1 failedNodes=0 failedConns=0

but I select it using secondary index like following, it works fine.

aql> select userId from test.dealer where isDel=0
+-----------+
| userId    |
+-----------+
| "aAAAADs" |
| "cwAAAEc" |
| "ZgAAADg" |
| "bAAAAD8" |
| "RAAAAAY" |
+-----------+

5 rows in set (0.001 secs)

I do some simple test work, I found following conditions can cause this issue:

  • at that time, anther scan job ( which scan another set) is running and not completed.
  • select all on different set by AQL.

but if I use secondary index or primary key to query by AQL, it works fine. if I stopped the scan job, it works fine too. And if I scan the same set, it also work fine.

Sometimes, I change the timeout setting, the select AQL can work, but the performance cannot be acceptable, because the records in this set are not so many ( less than 10 records), 1 second is enough to finished this query.

So I turn to the community to resolve my issue, thanks.


#2

Scans can be slow. If waiting for it is not an option, use a secondary index.


#3

But it seems multiple scans on multiple sets may block each other even though one set has very few records, is it a normal behavior or I have some configure issue? How can I improve it ?


#4

Scans have to traverse the entire namespace and all records. Query does not. You can try increasing scan threads but they still will queue up and only process one at a time. If your intent is to fetch all the records in a set, I have found a query that matches all records to be much faster. For my specific use case, over 33,000 times faster. This was because I was fetching 23 records out of a namespace with about a billion records. The scan has to iterate through all 1 billion records, the query does not.


#5

One cheap way, if you don’t already have a secondary index where you could potentially grab all records with a filter, is to add a single bin to all records with an integer “1”. Then you can query against this numeric bin with “1”. If you have an integer index already, just query that bin with range of Long.MIN and Long.MAX


#6

Thanks for your clear explain and suggestion, it is valuable to my project.


#7

No problem. Happy Aerospiking :slight_smile: