FAQ - What are the theories for tsvc timeout


#1

FAQ - What are the potential root causes for tsvc timeout

A client_tsvc_timeout occurs when a transaction sits on the transaction queue (tsvc-q) for longer than the timeout specified for the transaction. The transaction’s details are still not parsed at that point, which is the reason it is not categorized as a more specific timeout (like for example client_read_timeout or client_write_timeout).

How long is the timeout?

The timeout is typically set by the application through the policy used (usually called total timeout). If not set on the application side, the transaction-max-ms server side configuration, which defaults to 1 second, is used.

What causes such timeout?

There are many potential reasons, including hotkeys, increased traffic, misconfiguration or other bottlenecks in the transaction processing path.

If the tsvc_queue metric is piling up, it could be caused by a starvation of transaction threads (which are equal to transaction-threads-per-queue x transaction-queues). In some situations, increasing the number of threads (either one, or both of those configuration parameters) may help.

For hotkey, transactions that are parked and queued up at the rw hash (rw_in_progress – waiting for a transaction for the same key that is in progress to complete) will be put back in the transaction queue in order to be re-attempted. This will obviously cause extra latency and will increase the chances of timing out upon being picked up by a transaction thread.

Example on read/write

To illustrate the effects of a hot key, we can use a cluster of two nodes to test. We will modify the service-threads, transactions-queues and transactions-thread-per-queue to 1.

We also deliberately throttle the fabric channels in order to slow down transactions so that the latency can get to over 32ms. Depending on your system, it may not timeout as the default timeout of 1 second is pretty high.

We will therefore also adjust transaction-max-ms to a lower value. This will cause transactions to timeout upon being picked up by a transaction thread (causing the client_tsvc_timeout to tick:

 asinfo -v "set-config:context=service;transaction-max-ms=15

To simulate a hot key, we can repeatedly write a single record using the asbenchmark tool:

asbenchmark -h 10.0.100.137,10.0.100.254 -p 3000 -n nsSSD -k 1 -s testSet -S 100 -o S:1000  -w RU,0 -z 18 -latency 7,1    -KT String

And we can read this record using the following command:

asbenchmark -h 10.0.100.137,10.0.100.254 -p 3000 -n nsMem -k 1 -s testSet -S 100 -o S:1000  -w RU,100 -z 9 -latency 7,1    -KT String -consistencyLevel all

The output of the client write (timestamp in PDT timezone) when client_tsvc_timeout is lower:

...
...
2018-06-14 14:50:09.599 write(tps=192 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=192 timeouts=0 errors=0)
      <=1ms >1ms >2ms >4ms >8ms >16ms >32ms
write    1%  99%  99%  97%  95%   94%   87%
read     0%   0%   0%   0%   0%    0%    0%
2018-06-14 14:50:10.600 write(tps=178 timeouts=527 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=178 timeouts=527 errors=0)
      <=1ms >1ms >2ms >4ms >8ms >16ms >32ms
write    3%  97%  96%  92%  83%   51%   10%
read     0%   0%   0%   0%   0%    0%    0%
...
...

The corresponding output of asloglatency for the above is like this:

asloglatency -l aerospike.log -f head -n 10 -e 1 -h "{nsSSD}-write"
Histogram Name : {nsSSD}-write
Log       : aerospike.log
From      : 2018-06-14 21:49:03

Jun 14 2018 21:49:27
               % > (ms)
slice-to (sec)      1      2      4      8     16     32     64    128    256    512    ops/sec
-------------- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ----------
21:49:37    10  98.03  93.58  90.33  88.12  86.87  76.72  19.79   0.75   0.00   0.00      375.5
21:49:47    10  98.28  93.70  91.52  89.98  88.23  78.94  24.50   5.32   0.00   0.00      325.3
21:49:57    10  98.15  93.45  91.04  89.00  87.37  77.81  22.44   1.00   0.00   0.00      361.8
21:50:07    10  97.97  93.55  91.30  89.63  87.10  74.14  22.33   2.75   0.00   0.00      359.6
21:50:17    10  97.83  92.11  85.73  75.02  30.37  14.83   8.48   1.74   0.00   0.00      304.2
21:50:27    10  96.50  87.76  80.12  69.07  13.10   0.24   0.00   0.00   0.00   0.00      420.0
21:50:37    10  96.10  86.67  79.74  71.37   8.23   0.00   0.00   0.00   0.00   0.00      613.6
21:50:47    10  94.76  84.27  77.71  68.31   7.27   0.00   0.00   0.00   0.00   0.00      350.9
21:50:57    10   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00        0.0

When microbenchmarks are enabled, the write-restart and read-restart slices will shows increased latencies for hot keys (and similarly, for relevant transactions, batch-sub-restart, udf-sub-restart and udf-restart).

Note that the write-repl-write does not account for all the latency:

slice-to (sec)      1      2      4      8     16     32     64    128    256    512    ops/sec
-------------- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ----------
21:49:27    10  70.59  43.53  24.12  14.71   2.94   0.00   0.00   0.00   0.00   0.00       17.0
21:49:37    10  54.81  30.65  16.64   5.75   0.93   0.19   0.00   0.00   0.00   0.00      375.5
21:49:47    10  55.09  32.52  20.10   8.18   1.26   0.25   0.09   0.03   0.00   0.00      325.3
21:49:57    10  54.39  29.77  17.91   6.41   0.91   0.19   0.03   0.00   0.00   0.00      361.8
21:50:07    10  52.81  29.59  17.99   6.67   0.89   0.11   0.03   0.00   0.00   0.00      359.6
21:50:17    10  57.76  32.08  19.63   8.81   1.41   0.26   0.00   0.00   0.00   0.00      304.2
21:50:27    10  45.93  23.29  13.17   4.40   0.50   0.02   0.00   0.00   0.00   0.00      420.0
21:50:37    10  32.22  13.64   8.60   1.86   0.07   0.00   0.00   0.00   0.00   0.00      613.6
21:50:47    10  28.30  11.17   6.73   2.34   0.09   0.00   0.00   0.00   0.00   0.00      350.9
21:50:57    10   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00        0.0

And here is the output of the write-restart slice:

slice-to (sec)      1      2      4      8     16     32     64    128    256    512    ops/sec
-------------- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ----------
21:49:27    10  76.86  51.06  24.85  15.89   3.59   0.00   0.00   0.00   0.00   0.00      286.9
21:49:37    10  66.90  35.75  18.51   6.40   1.05   0.24   0.00   0.00   0.00   0.00     6123.8
21:49:47    10  67.38  37.20  22.21   9.40   1.64   0.27   0.09   0.03   0.00   0.00     5326.1
21:49:57    10  67.29  34.39  19.83   7.31   1.13   0.20   0.03   0.00   0.00   0.00     5889.1
21:50:07    10  66.50  35.25  20.60   7.98   1.10   0.16   0.03   0.00   0.00   0.00     5862.2
21:50:17    10  65.68  31.94  17.16   6.28   0.80   0.12   0.00   0.00   0.00   0.00     4150.5
21:50:27    10  57.04  21.97  10.13   2.27   0.00   0.00   0.00   0.00   0.00   0.00     5765.6
21:50:37    10  45.72  12.11   6.41   0.94   0.00   0.00   0.00   0.00   0.00   0.00     8908.6
21:50:47    10  43.42  10.12   5.02   1.05   0.00   0.00   0.00   0.00   0.00   0.00     5121.5
21:50:57    10   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00        0.0

Note the throughput (ops/sec) for the write-restart slice. It is over 4000 which is much higher than the incoming write traffic. This is due to the transactions getting put back in the transaction queue over and over again. As the transaction get re-attempted, so the write transaction latency increased by a lot.

The metrics fail_key_busy could be incremented if the transaction-pending-limit has been exceeded.

The following command can be used to show the relevant metrics that are non-zero:

asinfo -v "statistics" -l | egrep 'reap|client_conn|rw|tsvc'; asinfo -v "namespace/nsMem" -l | awk -F= '/client|busy/ { if ($2 != 0) {print $0} }'
tsvc_queue=15
rw_in_progress=1
client_connections=42
reaped_fds=0
early_tsvc_client_error=0
early_tsvc_batch_sub_error=0
early_tsvc_udf_sub_error=0
fabric_rw_send_rate=419749
fabric_rw_recv_rate=363460
dlog_overwritten_error=0
client_tsvc_timeout=1869
client_read_success=609925
client_write_success=15615
client_write_timeout=3

Example on UDF

To see the effect of UdF, we will write one record 1000 times using UDF based on our Java example:

                for (int x=1; x <=1000; x++)
                        client.execute(params.writePolicy, key, "record_example", "writeBin", Value.get(bin.name), bin.value);

Running the above will yield metrics as such:

asinfo -v "statistics" -l | egrep 'reap|client_conn|rw|tsvc'; asinfo -v "namespace/nsMem" -l | awk -F= '/client|busy|batch|udf_/ { if ($2 != 0) {print $0} }'
tsvc_queue=0
rw_in_progress=0
client_connections=3
reaped_fds=38
early_tsvc_client_error=0
early_tsvc_batch_sub_error=0
early_tsvc_udf_sub_error=0
fabric_rw_send_rate=1470
fabric_rw_recv_rate=646
dlog_overwritten_error=0
client_udf_complete=1000
client_lang_write_success=1000
enable-benchmarks-batch-sub=false

If we run 100 processes in parallel (i.e. 100 *1000 udf writes one record) we end up with a hot key situation:

asinfo -v "statistics" -l | egrep 'reap|client_conn|rw|tsvc'; asinfo -v "namespace/nsMem" -l | awk -F= '/client|busy|batch|udf_/ { if ($2 != 0) {print $0} }'
tsvc_queue=0
rw_in_progress=0
client_connections=3
reaped_fds=38
early_tsvc_client_error=0
early_tsvc_batch_sub_error=0
early_tsvc_udf_sub_error=0
fabric_rw_send_rate=0
fabric_rw_recv_rate=0
dlog_overwritten_error=0
client_udf_complete=54931
client_udf_error=59
client_lang_write_success=54931
fail_key_busy=59

Note that the fail_key_busy accounts for all the client_udf_error.

We repeat the test with the transaction-pending-limit set to 40, resulting in the tsvc_queue going up to 40:

grep tsvc-q aerospike.log | grep -v "tsvc-q 0"
Jun 01 2018 21:49:37 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 40 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0

After setting transaction-pending-limit to 200, the tsvc_queue increases up to 200:

Jun 04 2018 18:02:20 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 56 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:28 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 200 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:30 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 200 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:31 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 200 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:32 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 200 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:33 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 200 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:34 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 200 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:35 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 200 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:36 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 200 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:37 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 200 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:39 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 41 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0

Note: In this example, the ticker-interval is set to 1 second instead of the default 10 seconds.

This shows that we have up to 200 transactions parked at the rw hash that are constantly being put back into the transaction queue every time the 1 lucky transaction that did go through completes.

Example on batch

Using the same setup from above, perform batch read with a very low timeout:

asbenchmark -h 10.0.100.137,10.0.100.254 -p 3000 -n nsMem -k 1000000 -s testSet -S 100 -o S:200000  -w RU,100 -z 9 -latency 7,1    -KT String -B 100 -BT 10 -totalTimeout 12

This will cause the batch_sub_tsvc_timeout metric to tick:

 asinfo -v "statistics" -l | egrep 'reap|client_conn|rw|tsvc'; asinfo -v "namespace/nsMem" -l | awk -F= '/client|busy|batch/ { if ($2 != 0) {print $0} }'
...
batch_sub_tsvc_timeout=951342
...

Restoring service-threads, transactions-queues and transactions-thread-per-queue to their default eliminates the client_tsvc_timeout metric ticking in this case.

Notes

  • With data-in-memory true, read are usually served directly from the service-threads and do not go through the transaction queue. client_tsvc_timeout will not happen in such case (unless if for example duplicate resolution is required, or against older cluster that have the now deprecated allow-inline-transactions set to false).

Keywords

TSVC_TIMEOUT TRANSACTION TIMEOUT HOTKEY PENDING LIMITS

Timestamp

06/04/2018