FAQ - How can the performance of queries be monitored?


#1

FAQ - How can the performance of queries be monitored?

Detail

Within Aerospike there are macro-benchmarks which are written to the aerospike.log in histogram form by default. Within these macro-benchmarks there is a histogram called Query which gives performance of queries running within the cluster at a high level. In certain circumstances, it may be desirable to look at the performance of queries in more detail. What are the options for doing this?

Answer

There are a specific set of query-microbenchmarks that can be switched on at will. The command to do this is:

asinfo -v 'set-config:context= service;query-microbenchmark=true

For version 3.9 onwards, microbenchmarks can be enabled at the namespace level. The link below has the list of histograms possible. http://www.aerospike.com/docs/operations/monitor/latency#histograms

asinfo -v 'set-config:context=namespace;id=<namespaceName>;enable-benchmarks-read=true'

Once these are switched on the following histograms will be written into the aerospike.log.

Histogram Definition
query_txn_q_wait_us Histogram to track time spend in transaction queue if query_in_transaction is true
query_query_q_wait_us Histogram to track time spend waiting in queue for the query thread.If the query queue is backing up it may be worth increasing the query threads in case the CPU is not fully utilized.
query_prepare_batch_us Histogram to track the time spent while preparing query batches. Latency here could imply the secondary index is slow. It could also mean that the query batch (see Notes section) is too big.
query_batch_io_q_wait_us Histogram to track the time spent waiting in the queue for worker threads.
query_batch_io_us Histogram to track the time spent doing I/O per batch. This includes priority based sleep after n units of work. When latency is seen here, if more than 2 query worker threads are busy and if the system is not IO bound then try increasing the priority. The query threads may be yielding too much.
query_net_io_us Histogram to track time spend sending results to client. When there is latency here it implies a network issue or a slow client

Notes

  • When a query returns n records it does not process them sequentially but rather in a batch. Each batch is a unit of work with a default size of 100 (controlled by the query-batch-size parameter). Histograms that refer to batch size are measuring the time taken to prepare or wait for the batch in either queues or i/o

http://www.aerospike.com/docs/reference/configuration#query-batch-size

  • query-microbenchmarks is not a static parameter and cannot be included in aerospike.conf
  • Any histogram where the name ends _us is measuring in microseconds, not milliseconds as is more common in benchmark histograms.

Keywords

QUERY-MICROBENCHMARKS ANALYSE QUERY

Timestamp

6/6/16


#2

I’ve been attempting to figure out why a lot of my queries are running really slowly (I’m running under aerospark so there are a lot of layers) so I tried this. However, I’m getting an error:

asinfo -v ‘set-config:context= service;query-microbenchmark=true’

simply prints: “error”

I’m running the latest version of aerospike (community edition) and just installed the tools a few days ago so I am assuming that they are the latest as well.

Any recommendations as to what might be the problem here?


#3

LOL - there’s an extra space in the above command line.

asinfo -v 'set-config:context= service;query-microbenchmark=true

fails yet

asinfo -v 'set-config:context=service;query-microbenchmark=true

works.