Long running transaction Queueing Error when running performance tests (AER-4079)


#1

Running some of my own performance tests against Aerospike, which includes queryAggregate calls, I get back null. In the server logs, I see thousands of messages like this:

Jul 14 2015 15:26:35 GMT: WARNING (query): (thr_query.c::2050) Long running transaction Queueing Error... continue!!

Is there a limit to how many queries an Aerospike cluster can be expected to handle? Is it just a matter of increasing some properties a bit (maybe transaction-queues and transaction-threads-per-queue), or is there just no way Aerospike can handle thousands of queries per second?

Or is this maybe caused by something else entirely, such as an error in my script? (The script did work in some small local tests I did on a single-node VM though.)


#2

Hi Daniel,

Will you please share more of your logs so we can troubleshoot your issue more effectively?

Thanks,

Maud


#3

I don’t actually have those logs anymore (cluster got rebuilt), but it is the same message over and over again. A few thousand times a second. Interspersed with the usual histogram and heartbeat messages. There wasn’t a stack trace or anything.

The workload I’m running was executing my query hundreds of thousands of times per second. When I throttled it to around 500 it was fine. At 1000 I had errors, so it seems to be hitting a limit somewhere in there.

Essentially what I’m doing is scanning through the records to find those where the key matches a pattern, collecting the results into a map (in some cases results from multiple keys will be combined) and then returning the map. The data being returned is integers (counts).


#4

Hi Daniel,

A JIRA has been filed to follow up on this; it’s AER-4079 for your reference. Please stay tuned as we troubleshoot your issue.

Cheers,

Maud


#5

Daniel,

Are you running these tests with in-memory namespace configuration? or SSD configuration?

It looks like you are hitting the capacity limit, could be because of the way Aerospike is configured, or the available hardware resources or both.

It might be useful to get insights into the following:

  1. cluster set up (# of boxes, CPU cores, RAM)
  2. SSD specs (in case you are running SSD namespace configuration)
  3. How many records in the namespace
  4. How many records are returned per query

-samir


#6

Daniel,

The server has limits on number of queries that could be running in parallel at any point in time. This is to protect it from piling up requests while others have still not finished execution.

$ asinfo -v 'get-config:' -l |grep max-size
query-short-q-max-size=500
query-long-q-max-size=500

$ asinfo -v 'set-config:context=service;query-long-q-max-size=1000'
ok

$ asinfo -v 'get-config:' -l |grep max-size
query-short-q-max-size=500
query-long-q-max-size=1000

This is dyanmically configurable to higher value. And as you empirically observed the default value is 500.

BUT the real question is why are your queries taking so much time to finish. That will depend on how many records each of query looks at. What is your observation about the query latency i.e how much time it generally takes to finish. As Samir pointed out sharing logs would help. Can you please enable query microbenchmarks as well when you do next run.

asinfo -v 'set-config:context=service;query-microbenchmark=true'

– R


#7

Thanks for the replies. I am testing some other things right now, but when I get back to the queries I’ll enable the microbenchmarks and get some logs.

The cluster is in-memory only. 15 AWS instances of type r3.4xlarge. 1.83 TB memory total in the cluster, 16 CPUs.

I don’t know if this is an issue per se. I mainly need to confirm the error message I saw is due to executing more queries than the query-long-q-max-size setting, and to understand what the theoretical upper limits on number of queries would be (thousands? tens of thousands?).


#8

Daniel,

How many can run in parallel is equal to number of query threads you have configured in your system. Half of them is assigned for short running and rest of long running. (There is exception to it wherein system can run more queries than number of threads if any of the query stalls (like network IO taking more time etc etc.)

How many can be queued up at any point in time is bound by amount of memory you have in your system (Given configuration of max-query is not hit). Every query while running would consume a constant part to maintain the state of the query. And result part which depends on how big the result set it bound by two outgoing buffer size per query (default value of buffer size is 128k ). So you can think of upper bound of space utlization of query to be some where around 512M (not including as record it may have read, which won’t apply in case data is in memory) to 5MB (Depending on the configuration your are runnig with). So that would mean with the system with around 1G allowed for query execution (not including the one given for asd process to maintain other pieces of data) you can actually can have 2 million / 1000000 request queued at any point in time.

HTH

– R


#9

Raj, thanks for all the info. I have configured my cluster with a queue size of 10k:

$ asinfo -v 'get-config:' -l |grep max-size
query-short-q-max-size=10000
query-long-q-max-size=10000

I also enabled the query microbenchmarks. Here are my settings:

$ asinfo -v 'set-config:context=service;query-microbenchmark=true'
ok
$ asinfo -v 'get-config:' -l |grep microbenchmark
microbenchmarks=false
$ asinfo -v 'get-config:' -l |grep query
query-threads=6
query-worker-threads=15
query-priority=10
query-in-transaction-thread=0
query-req-in-query-thread=0
query-req-max-inflight=100
query-bufpool-size=256
query-batch-size=100
query-sleep=1
query-job-tracking=false
query-short-q-max-size=10000
query-long-q-max-size=10000
query-rec-count-bound=4294967295
query-threshold=10
query-untracked-time=1000000
query-hist-track-back=1800
query-hist-track-slice=10
query-hist-track-thresholds=1,8,64
query_rec_count-hist-track-back=1800
query_rec_count-hist-track-slice=10
query_rec_count-hist-track-thresholds=1,8,64

Right now I’m unable to get through my tests even throttling YCSB to 10 ops / sec without seeing “Long running transaction Queueing Error” in the logs.

Here’s a snippet from my logs while my tests are running: http://pastebin.com/nPYf24Vc

The weird thing is, even when my tests have stopped, I still see values listed. So I’m not sure what to make of these results. Here’s a later dump from the logs:

Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::137) histogram dump: udf (756400329 total) msec
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::154)  (00: 0755866892) (01: 0000206094) (02: 0000162962) (03: 0000107772)
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::154)  (04: 0000044487) (05: 0000007670) (06: 0000002741) (07: 0000001697)
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::154)  (08: 0000000011) (09: 0000000001) (12: 0000000001) (15: 0000000001)
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::137) histogram dump: query (764105 total) msec
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::154)  (00: 0000255734) (01: 0000061494) (02: 0000007461) (03: 0000002570)
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::154)  (04: 0000002903) (05: 0000005313) (06: 0000005469) (07: 0000011122)
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::154)  (08: 0000170821) (09: 0000232141) (10: 0000000942) (11: 0000000569)
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::154)  (12: 0000000546) (13: 0000000568) (14: 0000000665) (15: 0000000936)
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::154)  (16: 0000001634) (17: 0000001859) (18: 0000001205) (19: 0000000017)
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::163)  (20: 0000000027) (21: 0000000048) (22: 0000000061)
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (764105 total) count
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::154)  (00: 0000255805) (04: 0000498708) (05: 0000001866) (06: 0000000594)
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::154)  (07: 0000000483) (08: 0000000462) (09: 0000000981) (10: 0000000290)
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::154)  (11: 0000004752) (15: 0000000015) (16: 0000000037) (17: 0000000030)
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::163)  (18: 0000000082)
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::137) histogram dump: query_txn_q_wait_us (62603 total) usec
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::154)  (02: 0000000023) (03: 0000018933) (04: 0000040462) (05: 0000002950)
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::163)  (06: 0000000221) (07: 0000000012) (08: 0000000002)
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::137) histogram dump: query_prepare_batch_q_wait_us (52593 total) usec
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::163)  (41: 0000052593)
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::137) histogram dump: query_prepare_batch_us (52705 total) usec
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::163)  (41: 0000052705)
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::137) histogram dump: query_batch_io_q_wait_us (0 total) usec
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::137) histogram dump: query_batch_io_us (0 total) usec
Jul 30 2015 20:51:23 GMT: INFO (info): (hist.c::137) histogram dump: query_net_io_us (0 total) usec

I read over http://www.aerospike.com/docs/operations/monitor/v3port/reading_raw_latency_histograms.html but I’m still not sure what’s going on. It seems like there are 764,105 queries, but the queue only has 10k so that doesn’t seem possible - obviously I’m misunderstanding something.

Any idea?


#10

Daniel,

Histogram give you count of number of queries which have run till now

query histogram tells time it took query_rec_cnt histogram tells number of records it processed.

you have actually run total of 764105 (they are not currently running queries) but all that have run. Now if you look at the time it is taking to finish query in above data 255734 query finished less than ms and up to 2^22 msc that is roughly an hour or so. And number of records that you are reading ranges from no record to 2^22(look at histogram for query_rec_cnt) that is roughly 250k records per query.

  1. What client are you using … server tends to throttle or slow down the processing the client is not processing the request at fast pace.
  2. What version of server are you using
  3. You are running aggregation right ??? Confirming again

– R


#11

Raj, thanks for clarifying the histogram info.

  1. I am using the Java client, version 3.1.3.
  2. Server build is 3.5.14.
  3. Yes, this query is aggregating data (summing counts across records).

EDIT: I ran my tests again, throttled to 90 ops / second. Here’s a histogram from after my test:

Aug 04 2015 15:41:41 GMT: INFO (info): (hist.c::137) histogram dump: query (1023482 total) msec
Aug 04 2015 15:41:41 GMT: INFO (info): (hist.c::154)  (00: 0000255734) (01: 0000259667) (02: 0000048401) (03: 0000009168)
Aug 04 2015 15:41:41 GMT: INFO (info): (hist.c::154)  (04: 0000005467) (05: 0000006429) (06: 0000005483) (07: 0000011122)
Aug 04 2015 15:41:41 GMT: INFO (info): (hist.c::154)  (08: 0000170821) (09: 0000232141) (10: 0000000942) (11: 0000000569)
Aug 04 2015 15:41:41 GMT: INFO (info): (hist.c::154)  (12: 0000000546) (13: 0000000568) (14: 0000000665) (15: 0000000936)
Aug 04 2015 15:41:41 GMT: INFO (info): (hist.c::154)  (16: 0000001635) (17: 0000001862) (18: 0000001209) (19: 0000000033)
Aug 04 2015 15:41:41 GMT: INFO (info): (hist.c::154)  (20: 0000000050) (21: 0000000075) (22: 0000000092) (23: 0000000206)
Aug 04 2015 15:41:41 GMT: INFO (info): (hist.c::154)  (24: 0000000416) (25: 0000000823) (26: 0000001646) (27: 0000003247)
Aug 04 2015 15:41:41 GMT: INFO (info): (hist.c::163)  (28: 0000003529)

The 28th interval is days, isn’t it? Don’t queries timeout on the server at some point, or will it just keep running the query until it’s done? Is there any way to see how many queries are currently being processed?

Also, my tests are creating multiple client instances. Could this be exacerbating the problem? If I use the same client, would it avoid overloading the long running transaction queue?


#12

Side question: if my Lua query function looks something like this:

return stream : filter(inrange) : aggregate(map(), mapper) : reduce(reducer)

If the filter function filters everything (e.g. nothing matches the filter), I would expect to return an empty map. Is that correct?


#13

Daniel,

Thanks for set up details.

Yes that would days …

aql > show queries

aql > kill_query <query trid> 

With just 250k record per query … this doesn’t seem right. Trying it out will let you know how it goes …

Yes

– R


#14

Thanks Raj. I was able to see three running queries. Unfortunately, after I killed them I saw three more running queries; when I killed those, three more.

Why do I only see three queries at a time? At first I thought maybe my cluster was limited to three running queries, but if you look at my settings above you can see query-threads=6 and query-worker-threads=15. Any idea why only three queries are showing up at a time? Is there a way to see all the queries in the system, both those actively running and those queued?

Also, is there a way to kill all queries in the system? If I have hundreds of queries running, I don’t want to kill them one by one.

I also tried seeing how many queries were in the cluster using

aql> stat system

but I don’t know what to make of the results. Since each stat appears 15 times, I assume they are per-node - is there a way to get the combined stats across the cluster? Many of the stats prefixed with “query_” aren’t listed in the documentation, so I can only guess what they mean. Here are the query-related ones from a single node.

| "query_reqs"                       | 752167     |
| "query_success"                    | 53856      |
| "query_fail"                       | 43731      |
| "query_abort"                      | 698311     |
| "query_avg_rec_count"              | 158        |
| "query_short_queue_size"           | 0          |
| "query_long_queue_size"            | 0          |
| "query_short_queue_full"           | 43731      |
| "query_long_queue_full"            | 697183     |
| "query_short_running"              | 697183     |
| "query_long_running"               | 11253      |
| "query_tracked"                    | 9368       |
| "query_agg"                        | 752167     |
| "query_agg_success"                | 53856      |
| "query_agg_err"                    | 0          |
| "query_agg_abort"                  | 698311     |
| "query_agg_avg_rec_count"          | 158        |
| "query_lookups"                    | 0          |
| "query_lookup_success"             | 0          |
| "query_lookup_err"                 | 0          |
| "query_lookup_abort"               | 0          |
| "query_lookup_avg_rec_count"       | 0          |

If the value of “query_long_running” is 11253, does that mean there are 11,253 long running queries currently in the system, or is that the number that have run since the cluster started?

The value of “query_long_queue_size” is 0 on most nodes, but on one it is 6090. The value of “query_short_queue_size” is 0 on all nodes. (Assuming each time the value is listed it is for a different node.) I guess there are 6,090 queries queued on one node? Any idea why just one node would have all these queries?


#15

Daniel,

Query has two pieces which work in tandem.

Generator queries secondary index to get the list of record. And Worker threads which perform IO. Query threads are the threads which act as generator. System divides these threads in half. One half 3 threads for the short running queries and other half 3 threads for long running queries. Long running queries are the queries which return more than query-threshold number of records. Given you have data in memory the record read is performed inline.

As detailed above query-threads == 6 means you can have 3 long running transactions at any point in time. Bump this number to 2*number of cores in your box. For r3.4xlarge it I think is 32.

Since the cluster started

That could be problem. Given there are that many queries piled up other as well, they probably have a cumulative effect.

On the question of what could be the reason, I’m not sure that this is not normal behavior. Is there something different you see in that node? Any warning in the log / difference in stats / query stats? Are you running without any timeout? If you are, then bumping the number of threads should take you out of this situation.

– R


#16

Just to add to it the query scheduler today is actually FIFO based (With two queues as indicated above 1 each for short running and long running) unless client is not consuming packet at the rate server is producing, in which case the query yields and got back to the bottom of the queue … So it is surprising that at one node it is just piled up.

Very unlikely that data is so skewed to make that one node do log more work. What does you query_rec_count histogram look on that node compared to others ?

– R


#17

I think there was an issue with that one node. After restarting the Aerospike service it has been OK.

Thanks for your help. I think I’ve gotten what I need out of testing.