FAQ - What is the relationship between remote DC latency and XDR throttling
XDR will throttle the amount of records it sends to remote DCs in certain situations by putting read threads to sleep. For example when encountering some temporary errors at a destination cluster, or under some network failure conditions. This articles focuses on the relationship between the latency of the remote DC and the level to which throttling occurs.
Remote DC latencies are used, alongside a potential user specified TPS limit xdr-max-ship-throughput, to derive a number for maximum records in flight at any given time. If a DC has higher latency it is allowed more records in flight as this means fewer round trips are possible in a given time period, and so increased parallelism (via more records in flight) is necessary to achieve a given TPS. For example, if a link between 2 DCs has a round trip latency of 10ms, putting 1 record at a time on the link (1 record in flight) would allow for 100 records to be written every second (throughput of 100). In default configuration (no
xdr-max-ship-throughputset) the derived value for the maximum number of objects that can be in flight at one time is 50000. If the records in flight exceed this value, XDR will start to throttle.
All remote DCs share common read threads and as such if shipping to one remote DC is throttled (via making a read thread sleep) then shipping to all remote DCs will be throttled. When a read thread sleeps, it is not available to ship to any remote DC.
Given that XDR accounts for remote DC latency and uses this in calculating the in-flight limit, read threads should sleep, on average, for an equal amount of time for each DC. This means that the latency of a remote DC should not affect throttling on the source DC.
Throttling related information can be found when turning on debug log level for the xdr context.
- What can cause XDR throttling
XDR DC THROTTLE LATENCY REMOTE