FAQ - What can cause XDR throttling?

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

FAQ - What are the causes of XDR throttling

Detail

This knowledgebase discusses the ability to throttle the XDR throughput. XDR will report values for throughput into the aerospike.log file.

Version 5.x and above:

In case of a transient error, XDR will auto-throttle on a per partition basis. An XDR transient error from the destination will cause XDR to add the Digest and LUT entry to a Retry Queue. Every (XTQ) XDR Transaction Queue has a corresponding Retry Queue. A DC Manager Thread will first pick up entries from a Retry Queue, and then from (XTQ) XDR transaction Queue for a total of 50 Digest entries per partitions at each lap of the DC Manager Thread.

When Retry Queue hits a certain threshold (10 entries) the DC Manager Thread will go into Retry Only Mode. In that mode, DC Manager Thread will stop picking any entries from XTQ (XDR Transaction Queue) and will retry from Retry Queue up to the retry threshold. DC Manager Thread reduces the number of entries picked for retry after every lap while the transient issue is ongoing and up to the minimum threshold of 5 entries from the Retry Queue. This algorithm ensures throttling will occur in case of transient shipping errors.

Version 3.8.x to 4.9.x:

At times XDR will intentionally decrease the throughput and this will be visible. The following log line shows details around XDR throttling.

[DC_NAME]: dc-state CLUSTER_UP timelag-sec 2 lst 1468006386894 mlst 1468006389647 (2016-07-08 19:33:09.647 GMT) fnlst 0 (-) wslst 0 (-) shlat-ms 0 rsas-ms 0.020 rsas-pct 10.0

In the line above rsas-ms shows the average sleep time for each write to the DC. When this starts to increase XDR is sleeping more. Putting read threads to sleep is the mechanism by which XDR throttles throughput. This is logged in the dc_remote_ship_avg_sleep statistic. Throttling is also shown in the rsas-pct which describes the percentage of throttled writes to the DC. This is logged in the dc_remote_ship_avg_sleep_pct statistic.

What are the key reasons why XDR will throttle throughput to a remote DC?

Answer

Excessive latency

XDR will begin to throttle when the latency gets too high. The threshold used to determine when the lag is too high is when it is 25% of the xdr- write-timeout. The default value for the xdr-write-timeout is 10000 milliseconds and so when latency exceeds 2500ms XDR will start to throttle throughput. This stops XDR flooding a datacenter that cannot cope with high throughput.

Maximum configured throughput hit

To avoid flooding the network, XDR can be configured with a maximum allowed throughput (number of records being written to the destination per second). This is controlled in older version of XDR by the xdr-max-ship-throughput parameter. This setting has been renamed in XDR 5.0 and above ( max-throughput). XDR actually turns this into a maximum number of objects that can be inflight, based on the link latency for a given DC. For example, if a link between 2 DCs has a round trip latency of 10ms, putting 1 record at a time on the link (1 record in flight) would allow for 100 records to be written every second (throughput of 100). In default configuration (no xdr-max-ship-throughput set) the derived value for the maximum number of objects that can be in flight at one time is 50000. If the records in flight exceed this value, XDR will start to throttle.

The client has run out of connections

XDR uses the Aerospike C client to ship records, and will use 64 connections per node at the destination cluster (objects are pipelined on those connections, meaning more than one record at a time can be in flight on a single connection). During startup of XDR (and startup of the underlying C client) the client can for a short period of time run out of connections are those are still being established. This can also happen immediately following a link failure when connection get re-established.

When the remote DC reported an error

When a DC reports an error XDR will throttle down to avoid potential 100% CPU usage scenarios. If errors are continuously happening for 30 seconds, the DC will then be considered down and a window shipper thread will be started. Note that in case of a remote DC error (xdr_ship_destination_error), a 30 seconds window is given to recover before putting the cluster into cluster down mode (which will recover through a window shipper when it is again reachable).

Network Error or Timeout

In the event that there is a network error when attempting to ship to the destination or if there is a timeout on XDR write, the throughput drops to 1 record shipped per second per XDR read thread (xdr-read-threads) which is 4 by default for each remote DC. These errors are considered transient and, in this situation, after a 30 seconds interval through which XDR ships at 1 transaction per second per XDR read thread, the throughput (actually the number of records on the wire) is then doubled every 2 seconds until the maximum of either 50k or a configured to a maximum throughput (xdr-max-ship-throughput) is reached.

Note: As of version 4.4, XDR gradually slows down when encountering network errors or timeouts:

  • reducing the throughput by 50% for 1 second on the first error/timeout.
  • if errors/timeouts continue, reduce the throughput down another 50% for 1 second.
  • this will continue down to 1 transaction per thread per second.
  • upon not encountering any error or timeout, XDR will double the throughput every 1 second (as detailed previously).

Kernel misconfiguration

XDR will also throttle if one or any of the nodes on the destination doesn’t have the Aerospike default values for kernel.shmmax = 68719476736 and kernel.shmall = 4294967296. On the nodes where this is not set right, you will start seeing warnings like this:

Jan 16 2018 00:00:23 GMT: WARNING (arenax): (arenax_ee.c:170) could not allocate 1073741824-byte arena stage 18: block size error
Jan 16 2018 00:00:23 GMT: WARNING (index): (index.c:710) arenax alloc failed
Jan 16 2018 00:00:23 GMT: WARNING (rw): (write.c:533) {ns_seg} write_master: fail as_record_get_create() <Digest>:0x786478ecb1e535271805ce5e742c769dc2b8230f

And the source nodes will start relogging and this inturn will start throttling.

Errors returned by the destination cluster

Here are errors for which XDR will throttle:

  • Error during connection establishment (includes issues with Authentication and TLS).
  • Timeouts errors returned by the server.
  • AEROSPIKE_ERR_SERVER: Generic server error.
  • AEROSPIKE_ERR_RECORD_BUSY.
  • AEROSPIKE_ERR_DEVICE_OVERLOAD.
  • AEROSPIKE_ERR_FAIL_FORBIDDEN: Temporary forbidden errors like stop writes due to clock skew.

Here are errors for which XDR will not throttle:

  • AEROSPIKE_ERR_RECORD_NOT_FOUND: Can only happen for delete. This is the only non-permanent error under this category.
  • AEROSPIKE_ERR_RECORD_TOO_BIG.
  • AEROSPIKE_ERR_ALWAYS_FORBIDDEN: Happens when allow-xdr-writes is set to false on the remote DC. Or in general anything forbidden due to config on remote.

Notes

  • Detail on xdr-write-timeout
  • Detail on xdr-max-ship-throughput
  • Relationship between XDR latency and throttling in detail
  • Log messages explained in detail

http://www.aerospike.com/docs/reference/serverlogmessages#common-log-messages

Keywords

XDR THROTTLE BANDWIDTH LAG

Timestamp

4/12/19