FAQ - What can cause XDR throttling?


#1

FAQ - What are the causes of XDR throttling

Detail

In Aerospike 3.8.x and higher XDR will report values for throughput into the aerospike.log file. At times XDR will intentionally decrease the throughput and this will be visible. The following log line shows details around XDR throttling.

[DC_NAME]: dc-state CLUSTER_UP timelag-sec 2 lst 1468006386894 mlst 1468006389647 (2016-07-08 19:33:09.647 GMT) fnlst 0 (-) wslst 0 (-) shlat-ms 0 rsas-ms 0.020 rsas-pct 10.0

In the line above rsas-ms shows the average sleep time for each write to the DC. When this starts to increase XDR is sleeping more. Putting read threads to sleep is the mechanism by which XDR throttles throughput. This is logged in the dc_remote_ship_avg_sleep statistic. Throttling is also shown in the rsas-pct which describes the percentage of throttled writes to the DC. This is logged in the dc_remote_ship_avg_sleep_pct statistic.

What are the key reasons why XDR will throttle throughput to a remote DC?

Answer

Excessive latency

XDR will begin to throttle when the latency gets too high. The threshold used to determine when the lag is too high is when it is 25% of the xdr- write-timeout. The default value for the xdr-write-timeout is 10000 milliseconds and so when latency exceeds 2500ms XDR will start to throttle throughput. This stops XDR flooding a datacenter that cannot cope with high throughput.

Maximum configured throughput hit

To avoid flooding the network, XDR can be configured with a maximum allowed throughput (number of records being written to the destination per second). This is controlled by the xdr-max-ship-throughput parameter. XDR actually turns this into a maximum number of objects that can be inflight, based on the link latency for a given DC. For example, if a link between 2 DCs has a round trip latency of 10ms, putting 1 record at a time on the link (1 record in flight) would allow for 100 records to be written every second (throughput of 100). In default configuration (no xdr-max-ship-throughput set) the derived value for the maximum number of objects that can be in flight at one time is 50000. If the records in flight exceed this value, XDR will start to throttle.

The client has run out of connections

XDR uses the Aerospike C client to ship records, and will use 64 connections per node at the destination cluster (objects are pipelined on those connections, meaning more than one record at a time can be in flight on a single connection). During startup of XDR (and startup of the underlying C client) the client can for a short period of time run out of connections are those are still being established. This can also happen immediately following a link failure when connection get re-established.

When the remote DC reported an error

When a DC reports an error XDR will throttle down to avoId potential 100% CPU usage scenarios. If errors are continuously happening for 30 seconds, the DC will then be considered down and a window shipper thread will be started. Note that in case of a remote DC error (xdr_ship_destination_error), a 30 seconds window is given to recover before putting the cluster into cluster down mode (which will recover through a window shipper when it is again reachable).

Network Error or Timeout

In the event that there is a network error when attempting to ship to the destination or if there is a timeout on XDR write, the throughput drops to 1 record shipped per second per XDR read thread (xdr-read-threads) which is 4 by default for each remote DC. These errors are considered transient and, in this situation, after a 30 seconds interval through which XDR ships at 1 transaction per second per XDR read thread, the throughput (actually the number of records on the wire) is then doubled every 2 seconds until the maximum of either 50k or a configured to a maximum throughput (xdr-max-ship-throughput) is reached.

Kernel misconfiguration

XDR will also throttle if one or any of the nodes on the destination doesn’t have the Aerospike default values for kernel.shmmax = 68719476736 and kernel.shmall = 4294967296. On the nodes where this is not set right, you will start seeing warnings like this.

Jan 16 2018 00:00:23 GMT: WARNING (arenax): (arenax_ee.c:170) could not allocate 1073741824-byte arena stage 18: block size error
Jan 16 2018 00:00:23 GMT: WARNING (index): (index.c:710) arenax alloc failed
Jan 16 2018 00:00:23 GMT: WARNING (rw): (write.c:533) {ns_seg} write_master: fail as_record_get_create() <Digest>:0x786478ecb1e535271805ce5e742c769dc2b8230f

And the source nodes will start relogging and this inturn will start throttling.

Notes

  • Detail on xdr-write-timeout

http://www.aerospike.com/docs/reference/configuration#xdr-write-timeout

  • Detail on xdr-max-ship-throughput

http://www.aerospike.com/docs/reference/configuration#xdr-max-ship-throughput

  • Relationship between XDR latency and throttling in detail
  • Log messages explained in detail

http://www.aerospike.com/docs/reference/serverlogmessages#common-log-messages

Keywords

XDR THROTTLE BANDWIDTH LAG

Timestamp

8/15/17

Notes

  • Detail on xdr-write-timeout

http://www.aerospike.com/docs/reference/configuration#xdr-write-timeout

  • Detail on xdr-max-ship-throughput

http://www.aerospike.com/docs/reference/configuration#xdr-max-ship-throughput

  • Relationship between XDR latency and throttling in detail
  • Log messages explained in detail

http://www.aerospike.com/docs/reference/serverlogmessages#common-log-messages

Keywords

XDR THROTTLE BANDWIDTH LAG

Timestamp

8/15/17


What is the unit for XDR max ship throughput?
What are the options for reducing XDR's network utilization?
How to identify a bad DC that cause XDR throttling
How do I handle a planned network maintenance between XDR source and destination?
#2

10,000 milliseconds or 10 seconds


#3

milliseconds. Thanks for flagging. Article has been updated.