The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.
FAQ - What metrics can be used to determine a correct value for xdr-read-threads
?
Detail
When shipping data to a remote data centre via XDR the architecture is straightforward. The digest log
stores the digests of records incoming to the node. Digest log reader threads read the digests from the digest log and put the digest on to in-memory read request queues. Threads known as xdr-read-threads
pick the digests from these in-memory request queues, process them through the de-duplication cache, schedule the read for the associated record via the service threads (or transaction queues/threads for versions prior to 4.7), potentially applies compression and, finally, pass them to an embedded Aerospike C client which ships the record to the destination cluster(s).
Most common reasons for a build up in xdr_timelag
are described in the FAQ - What Can Cause XDR Throttling article but in some cases a build up in xdr_timelag
may be observed due to slow performance of the tasks assigned to the xdr-read-threads
. In that instance, increasing the number of xdr-read-threads
may be an appropriate solution. What Aerospike metrics exist to determine whether an increase in xdr-read-threads
could alleviate the xdr_timelag
?
Answer
The following metrics are used to track the behaviour of xdr-read-threads
. These are fully documented in the Aerospike Metrics Reference.
-
xdr_read_active_avg_pct
. This describes the amount of time the xdr read threads spend working as opposed to waiting for digests to appear on the queues they service. High percentages for this metric along with a higher CPU usage may indicate a need to increase the number ofxdr-read-threads
. When the CPU is at lower utlisation the expectation is that the default number ofxdr-read-threads
should be sufficient to handle the XDR load. -
xdr_read_reqq_used_pct
. This gives a value in terms of percent for how full the read request queues are. This metric should be used with care. A slow disk will cause this metric to be high and so it is not a good indicator of a need to increasexdr-read-threads
. -
There is a maximum of 10,000 transactions that can be in flight in the internal XDR transaction queue. This is a hard limit and as such, if there are 10000 transactions or near in flight, increasing the number of
xdr-read-threads
will not solve a source side lag issue. This is measured in raw numbers byxdr_read_txnq_used
or as a percentagexdr_read_txnq_used_pct
.
Notes
- Dynamically decreasing
xdr-read-threads
has been known to cause node crashes in some rare situations, it is therefore advisable to decrease this configuration parameter statically (restart required). - The following log line shows
xdr_timelag
:
[DC_NAME]: dc-state CLUSTER_UP timelag-sec 2 lst 1468006386894 mlst 1468006389647 (2016-07-08 19:33:09.647 GMT) fnlst 0 (-) wslst 0 (-) shlat-ms 0 rsas-ms 0.004 rsas-pct 0.0 con 384 errcl 0 errsrv 0 sz 6
- The most common reason for the
xdr_timelag
to build up is actually throttling. Refer to the article titled “What Can Cause XDR Throttling” for details.
Applies To
Server prior to v. 5.0
Keywords
XDR-READ-THREADS XDR-TIMELAG LAG XDR
Timestamp
November 2019