FAQ - What are the theories for tsvc timeout

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

FAQ - What are the potential root causes for tsvc timeout

A client_tsvc_timeout occurs when a transaction sits on the transaction queue (tsvc-q) for longer than the timeout specified for the transaction. The transaction’s details are still not parsed at that point, which is the reason it is not categorized as a more specific timeout (like for example client_read_timeout or client_write_timeout).

Important Note: As of Aerospike 4.7, transaction-queues and transaction-threads-per-queue have been removed and transactions are processed instead by the service-threads. For such versions, a client_tsvc_timeout occurs either due to having spent too much time during the demarshal phase, or, as part of a transaction that was restarted (because another transaction for the same record was already in the rw-hashrw_in_progress – which can happen for write transactions or reads during duplicate resolution or in strong-consistency enabled namespaces).

How long is the timeout?

The timeout is set by the application through the policy used (either the socketTimeout or the totalTimeout ). If not set on the application side, the transaction-max-ms server side configuration, which defaults to 1 second, is used. For single record transactions, recent client libraries will send socketTimeout to the server instead of totalTimeout when socketTimeout < totalTimeout. More details on this in our knowledge-base article on Understanding Timeout and Retries.

What causes such timeout?

There are many potential reasons, including hotkeys, increased traffic, misconfiguration or other bottlenecks in the transaction processing path.

If the tsvc_queue metric is piling up, it could be caused by a starvation of transaction threads (which are equal to transaction-threads-per-queue x transaction-queues). In some situations, increasing the number of threads (either one, or both of those configuration parameters) may help.

For transactions that find one (or more) previous transaction(s) for the same record that is (are) still parked and queued up at the rw hash (rw_in_progress, they will be put restarted and reprocessed by a service thread. This will obviously cause extra latency and will increase the chances of timing out upon being reprocessed. Such situation can occur for hotkeys (typically for writes but potentiall for reads during duplicate resolution, for example, or in strong consistent namespaces.

Example on read/write

To illustrate the effects of a hot key, we can use a cluster of two nodes to test. We will modify the service-threads, transactions-queues and transactions-thread-per-queue to 1.

We also deliberately throttle the fabric channels in order to slow down transactions so that the latency can get to over 32ms. Depending on your system, it may not timeout as the default timeout of 1 second is pretty high.

We will therefore also adjust transaction-max-ms to a lower value. This will cause transactions to timeout upon being picked up by a transaction thread (causing the client_tsvc_timeout to tick:

 asinfo -v "set-config:context=service;transaction-max-ms=15

To simulate a hot key, we can repeatedly write a single record using the asbenchmark tool:

asbenchmark -h 10.0.100.137,10.0.100.254 -p 3000 -n nsSSD -k 1 -s testSet -S 100 -o S:1000  -w RU,0 -z 18 -latency 7,1    -KT String

And we can read this record using the following command:

asbenchmark -h 10.0.100.137,10.0.100.254 -p 3000 -n nsMem -k 1 -s testSet -S 100 -o S:1000  -w RU,100 -z 9 -latency 7,1    -KT String -consistencyLevel all

The output of the client write (timestamp in PDT timezone) when client_tsvc_timeout is lower:

...
...
2018-06-14 14:50:09.599 write(tps=192 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=192 timeouts=0 errors=0)
      <=1ms >1ms >2ms >4ms >8ms >16ms >32ms
write    1%  99%  99%  97%  95%   94%   87%
read     0%   0%   0%   0%   0%    0%    0%
2018-06-14 14:50:10.600 write(tps=178 timeouts=527 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=178 timeouts=527 errors=0)
      <=1ms >1ms >2ms >4ms >8ms >16ms >32ms
write    3%  97%  96%  92%  83%   51%   10%
read     0%   0%   0%   0%   0%    0%    0%
...
...

The corresponding output of asloglatency for the above is like this:

asloglatency -l aerospike.log -f head -n 10 -e 1 -h "{nsSSD}-write"
Histogram Name : {nsSSD}-write
Log       : aerospike.log
From      : 2018-06-14 21:49:03

Jun 14 2018 21:49:27
               % > (ms)
slice-to (sec)      1      2      4      8     16     32     64    128    256    512    ops/sec
-------------- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ----------
21:49:37    10  98.03  93.58  90.33  88.12  86.87  76.72  19.79   0.75   0.00   0.00      375.5
21:49:47    10  98.28  93.70  91.52  89.98  88.23  78.94  24.50   5.32   0.00   0.00      325.3
21:49:57    10  98.15  93.45  91.04  89.00  87.37  77.81  22.44   1.00   0.00   0.00      361.8
21:50:07    10  97.97  93.55  91.30  89.63  87.10  74.14  22.33   2.75   0.00   0.00      359.6
21:50:17    10  97.83  92.11  85.73  75.02  30.37  14.83   8.48   1.74   0.00   0.00      304.2
21:50:27    10  96.50  87.76  80.12  69.07  13.10   0.24   0.00   0.00   0.00   0.00      420.0
21:50:37    10  96.10  86.67  79.74  71.37   8.23   0.00   0.00   0.00   0.00   0.00      613.6
21:50:47    10  94.76  84.27  77.71  68.31   7.27   0.00   0.00   0.00   0.00   0.00      350.9
21:50:57    10   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00        0.0

When microbenchmarks are enabled, the write-restart and read-restart slices will shows increased latencies for hot keys (and similarly, for relevant transactions, batch-sub-restart, udf-sub-restart and udf-restart).

Note that the write-repl-write does not account for all the latency:

slice-to (sec)      1      2      4      8     16     32     64    128    256    512    ops/sec
-------------- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ----------
21:49:27    10  70.59  43.53  24.12  14.71   2.94   0.00   0.00   0.00   0.00   0.00       17.0
21:49:37    10  54.81  30.65  16.64   5.75   0.93   0.19   0.00   0.00   0.00   0.00      375.5
21:49:47    10  55.09  32.52  20.10   8.18   1.26   0.25   0.09   0.03   0.00   0.00      325.3
21:49:57    10  54.39  29.77  17.91   6.41   0.91   0.19   0.03   0.00   0.00   0.00      361.8
21:50:07    10  52.81  29.59  17.99   6.67   0.89   0.11   0.03   0.00   0.00   0.00      359.6
21:50:17    10  57.76  32.08  19.63   8.81   1.41   0.26   0.00   0.00   0.00   0.00      304.2
21:50:27    10  45.93  23.29  13.17   4.40   0.50   0.02   0.00   0.00   0.00   0.00      420.0
21:50:37    10  32.22  13.64   8.60   1.86   0.07   0.00   0.00   0.00   0.00   0.00      613.6
21:50:47    10  28.30  11.17   6.73   2.34   0.09   0.00   0.00   0.00   0.00   0.00      350.9
21:50:57    10   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00        0.0

And here is the output of the write-restart slice:

slice-to (sec)      1      2      4      8     16     32     64    128    256    512    ops/sec
-------------- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ----------
21:49:27    10  76.86  51.06  24.85  15.89   3.59   0.00   0.00   0.00   0.00   0.00      286.9
21:49:37    10  66.90  35.75  18.51   6.40   1.05   0.24   0.00   0.00   0.00   0.00     6123.8
21:49:47    10  67.38  37.20  22.21   9.40   1.64   0.27   0.09   0.03   0.00   0.00     5326.1
21:49:57    10  67.29  34.39  19.83   7.31   1.13   0.20   0.03   0.00   0.00   0.00     5889.1
21:50:07    10  66.50  35.25  20.60   7.98   1.10   0.16   0.03   0.00   0.00   0.00     5862.2
21:50:17    10  65.68  31.94  17.16   6.28   0.80   0.12   0.00   0.00   0.00   0.00     4150.5
21:50:27    10  57.04  21.97  10.13   2.27   0.00   0.00   0.00   0.00   0.00   0.00     5765.6
21:50:37    10  45.72  12.11   6.41   0.94   0.00   0.00   0.00   0.00   0.00   0.00     8908.6
21:50:47    10  43.42  10.12   5.02   1.05   0.00   0.00   0.00   0.00   0.00   0.00     5121.5
21:50:57    10   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00        0.0

Note the throughput (ops/sec) for the write-restart slice. It is over 4000 which is much higher than the incoming write traffic. This is due to the transactions getting put back in the transaction queue over and over again. As the transaction get re-attempted, so the write transaction latency increased by a lot.

The metrics fail_key_busy could be incremented if the transaction-pending-limit has been exceeded.

The following command can be used to show the relevant metrics that are non-zero:

asinfo -v "statistics" -l | egrep 'reap|client_conn|rw|tsvc'; asinfo -v "namespace/nsMem" -l | awk -F= '/client|busy/ { if ($2 != 0) {print $0} }'
tsvc_queue=15
rw_in_progress=1
client_connections=42
reaped_fds=0
early_tsvc_client_error=0
early_tsvc_batch_sub_error=0
early_tsvc_udf_sub_error=0
fabric_rw_send_rate=419749
fabric_rw_recv_rate=363460
dlog_overwritten_error=0
client_tsvc_timeout=1869
client_read_success=609925
client_write_success=15615
client_write_timeout=3

Example on UDF

To see the effect of UdF, we will write one record 1000 times using UDF based on our Java example:

                for (int x=1; x <=1000; x++)
                        client.execute(params.writePolicy, key, "record_example", "writeBin", Value.get(bin.name), bin.value);

Running the above will yield metrics as such:

asinfo -v "statistics" -l | egrep 'reap|client_conn|rw|tsvc'; asinfo -v "namespace/nsMem" -l | awk -F= '/client|busy|batch|udf_/ { if ($2 != 0) {print $0} }'
tsvc_queue=0
rw_in_progress=0
client_connections=3
reaped_fds=38
early_tsvc_client_error=0
early_tsvc_batch_sub_error=0
early_tsvc_udf_sub_error=0
fabric_rw_send_rate=1470
fabric_rw_recv_rate=646
dlog_overwritten_error=0
client_udf_complete=1000
client_lang_write_success=1000
enable-benchmarks-batch-sub=false

If we run 100 processes in parallel (i.e. 100 *1000 udf writes one record) we end up with a hot key situation:

asinfo -v "statistics" -l | egrep 'reap|client_conn|rw|tsvc'; asinfo -v "namespace/nsMem" -l | awk -F= '/client|busy|batch|udf_/ { if ($2 != 0) {print $0} }'
tsvc_queue=0
rw_in_progress=0
client_connections=3
reaped_fds=38
early_tsvc_client_error=0
early_tsvc_batch_sub_error=0
early_tsvc_udf_sub_error=0
fabric_rw_send_rate=0
fabric_rw_recv_rate=0
dlog_overwritten_error=0
client_udf_complete=54931
client_udf_error=59
client_lang_write_success=54931
fail_key_busy=59

Note that the fail_key_busy accounts for all the client_udf_error.

We repeat the test with the transaction-pending-limit set to 40, resulting in the tsvc_queue going up to 40:

grep tsvc-q aerospike.log | grep -v "tsvc-q 0"
Jun 01 2018 21:49:37 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 40 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0

After setting transaction-pending-limit to 200, the tsvc_queue increases up to 200:

Jun 04 2018 18:02:20 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 56 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:28 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 200 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:30 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 200 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:31 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 200 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:32 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 200 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:33 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 200 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:34 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 200 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:35 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 200 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:36 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 200 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:37 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 200 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0
Jun 04 2018 18:02:39 GMT: INFO (info): (ticker.c:291)    in-progress: tsvc-q 41 info-q 0 nsup-delete-q 0 rw-hash 1 proxy-hash 0 tree-gc-q 0

Note: In this example, the ticker-interval is set to 1 second instead of the default 10 seconds.

This shows that we have up to 200 transactions parked at the rw hash that are constantly being put back into the transaction queue every time the 1 lucky transaction that did go through completes.

Example on batch

Using the same setup from above, perform batch read with a very low timeout:

asbenchmark -h 10.0.100.137,10.0.100.254 -p 3000 -n nsMem -k 1000000 -s testSet -S 100 -o S:200000  -w RU,100 -z 9 -latency 7,1    -KT String -B 100 -BT 10 -totalTimeout 12

This will cause the batch_sub_tsvc_timeout metric to tick:

 asinfo -v "statistics" -l | egrep 'reap|client_conn|rw|tsvc'; asinfo -v "namespace/nsMem" -l | awk -F= '/client|busy|batch/ { if ($2 != 0) {print $0} }'
...
batch_sub_tsvc_timeout=951342
...

Restoring service-threads, transactions-queues and transactions-thread-per-queue to their default eliminates the client_tsvc_timeout metric ticking in this case.

Notes

  • With data-in-memory true, read are usually served directly from the service-threads and do not go through the transaction queue. client_tsvc_timeout will not happen in such case (unless if for example duplicate resolution is required, or against older cluster that have the now deprecated allow-inline-transactions set to false).

Keywords

TSVC_TIMEOUT TRANSACTION TIMEOUT HOTKEY PENDING LIMITS

Timestamp

July 2020