FAQ Can there be XDR lag without a client or server error?

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

FAQ Can there be XDR lag without a client or server error?

Detail

When xdr_timelag is observed on a source XDR cluster it may be due to a variety of reasons, usually some error in the shipping of the records to the remote DC. Is there a scenario where XDR lag can build even in the absence of any errors?

Answer

Yes, it is not inconceivable to experience xdr_timelag when the system is not relogging records or indeed throttling the records that are being shipped. The most likely cause of this would be the impact of a hotkey on XDR operation. To understand how this can happen it is necessary to understand how XDR works internally.

When records are inserted, updated or deleted in an XDR enabled system, the digest for that record is inserted into the digest log, a ring buffer. The digest log is read sequentially by a thread which reads digests and places those digests on read thread queues. There will be a read thread queue for each xdr-read-thread. The first role of the xdr-read-threads is to take the digests and pass them to the Aerospike transaction or service threads (dependant on version) which then read the corresponding records to be shipped.

The read thread queues for the xdr-read-threads is deterministic which means that the same digest will always go to the same queue. Each xdr-read-thread has a cache to help it manage hotkeys, which are keys that are written with a very high frequency. The way in which the cache does this is simple. When a digest is read from the queue it is placed in the cache and held there for a pre-defined time period. The time period is configured using xdr-hotkey-time-ms. If the same digest enters the cache within xdr-hotkey-time-ms the digest is marked as dirty and not shipped. At the end of the xdr-hotkey-time-ms any dirty digests are shipped and the dirty flag is removed. Any digests without the dirty flag are removed from the cache. The net effect is that a given digest will not be shipped any more frequently than once per xdr-hotkey-time-ms.

When there is a hot key, as that key will always go to the same read thread queue, and as the queue size has a limit of 1000 records, it is possible that the read thread queue for a given xdr-read-thread can fill up. An example of this is shown below:

Collectinfo-analyzer> show stat like xdr_read_reqq_used -flip

~~~~~~~~~Service Statistics (2020-03-11 13:57:07 UTC)~~~~~~~~~

            NODE   xdr_read_reqq_used   xdr_read_reqq_used_pct   

10.0.0.10:3000                   85                     2.12   

10.0.0.11:3000                 1000                    25.00   

10.0.0.12:3000                 1790                    44.75   

10.0.0.1:3000                    30                     0.75   

10.0.0.2:3000                  1995                    49.88   

10.0.0.3:3000                  1061                    26.52   

10.0.0.4:3000                  1000                    25.00   

10.0.0.6:3000                    19                     0.47   

10.0.0.7:3000                  1000                    25.00   

10.0.0.8:3000                  1000                    25.00   

10.0.0.9:3000                     0                     0.00   

Number of rows: 11

The output below shows that there is a queue on nodes .11, .4,.7 and .8 that is totally full. This is indicated as xdr_read_req_used is 1000 exactly. This metric represents the amount of digests sitting in the read thread queues. By default there are 4 xdr-read-threads and 4 queues. So one queue sitting at 1000 digests would imply 25% capacity used (which is shown in xdr_read_reqq_used_pct).

It is also evident that the queues for nodes .12, .2 and .3 are also backing up as they are above 1000.

The question is, why are some queues sitting at 1000 and more digests? It is because the digest log is read sequentially and digests go to queues deterministically. If the next digest in the log waiting to be processed is one that must go to a queue that has hit the limit of 1000, it will block and wait until the queue it would end up in makes some room. The net effect will be that other queues drain as they are serviced and the queue being hit by the hotkey stays full (or multiple queues get high usage if multiple hotkeys).

In terms of how to mitigate this phenomenon, the answer is counter-intuitive. Logic would suggest that xdr-hotkey-time-ms should be decreased from the default value of 100ms to increase the rate at which the hotkeys are shipped but this is not correct. The caches used to manage XDR hotkeys are capable of holding 25,000 digests and so they are extremely unlikely to be blocked. In any event, a digest will only appear in this cache once, whether it is marked as clean or dirty.

As xdr_timelag represents the time (in seconds) it took the latest shipped record from the moment it was first written at the source until it was attempted to be shipped to the destination cluster, it can be considered to be a measure of how long a given digest waited in the digest log before being processed. When a read thread queue is blocked and the digests from the digest log are not being processed, the xdr_timelag will increase even in the absence of shipping errors.

The solution is actually to increase xdr-hotkey-time-ms. This allows the system to ship repeated (very hot or not) digests less frequently reducing the overall load on the system and allowing the blocked read thread queue to drain out. Once the read thread queue drains it can be refilled and the digest log can be processed once again.

To explain further. Each time the xdr-read-thread takes a digest from the read thread queue and reads this from the server the thread will be occupied solely with this task to the exclusion of others. If the thread is blocked it cannot remove digests from the read thread queue. As described above, when the read thread queue is full, the digests cannot be read from the digest log and lag builds. Increasing the xdr-hotkey-time-ms means pressure on the xdr-read-thread is reduced.

If a given digest is updated once every 100ms then the xdr-read-thread has to fetch the digest every 100ms (either from the post-write-queue but, more likely when there is lag in the system, from the storage subsystem). This is not the case, however, if the digest is found in the de-duplication cache. If the digest is found in the de-duplication cache then it is discarded as soon as it is picked off the read thread queue, this is extremely quick and so the thread is blocked for a very small amount of time.

Increasing xdr-hotkey-time-ms to, for example, 500ms, means that the a digest will stay in the de-duplication cache for 5 times longer meaning that for every 5 updates (using the example above), 4 will be simply discarded straight away. The net effect is that, with a change in xdr-hotkey-time-ms from 100ms to 500ms, the amount of digests discarded is increased and the xdr-read-thread is able to drain the queue far more efficiently.

By reducing the time spent fetching repeated digests and instead simply discarding those digests as they come off the read thread queue, the thread is free to do more work on unique digests and can drain the read thread queue faster. It is worth considering that a record being updated every 100ms would not necessarily be considered a hotkey.

Notes

  • The caveat to increasing xdr-hotkey-time-ms is that records that are updated repeatedly will be shipped less frequently.
  • Discussion on XDR hotkeys.
  • Throttling of shipping is a process XDR executes automatically to compensate for certain types of issue. It corresponds to the metric dc_ship_idle_avg and any non zero value indicates throttling.
  • The following log line shows whether XDR is throttling. The entry to check is rsas-ms which reports dc_ship_idle_avg as rsas-ms:
[DC_NAME]: dc-state CLUSTER_UP timelag-sec 2 lst 1468006386894 mlst 1468006389647 (2016-07-08 19:33:09.647 GMT) fnlst 0 (-) wslst 0 (-) shlat-ms 0 rsas-ms 0.004 rsas-pct 0.0 con 384 errcl 0 errsrv 0 sz 6
  • The log line above also shows xdr_timelag which is listed as timelag-sec.
  • The facility to trace individual transactions being processed by XDR is discussed in the XDR Tracing article and can help isolate potential issues with XDR shipping.
  • Common XDR throttling reasons.

Applies To

Server prior to v. 5.0

Keywords

XDR HOTKEY LAG NO ERROR DIGESTS

Timestamp

March 2020

1 Like