FAQ - What is the relationship between remote DC latency and XDR throttling

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

FAQ - What is the relationship between remote DC latency and XDR throttling

Detail

XDR will throttle the amount of records it sends to remote DCs in certain situations by putting read threads to sleep. For example when encountering some temporary errors at a destination cluster, or under some network failure conditions. This articles focuses on the relationship between the latency of the remote DC and the level to which throttling occurs.

Answer

  • Remote DC latencies are used, alongside a potential user specified TPS limit xdr-max-ship-throughput, to derive a number for maximum records in flight at any given time. If a DC has higher latency it is allowed more records in flight as this means fewer round trips are possible in a given time period, and so increased parallelism (via more records in flight) is necessary to achieve a given TPS. For example, if a link between 2 DCs has a round trip latency of 10ms, putting 1 record at a time on the link (1 record in flight) would allow for 100 records to be written every second (throughput of 100). In default configuration (no xdr-max-ship-throughput set) the derived value for the maximum number of objects that can be in flight at one time is 50000. If the records in flight exceed this value, XDR will start to throttle.

  • All remote DCs share common read threads and as such if shipping to one remote DC is throttled (via making a read thread sleep) then shipping to all remote DCs will be throttled. When a read thread sleeps, it is not available to ship to any remote DC.

  • Given that XDR accounts for remote DC latency and uses this in calculating the in-flight limit, read threads should sleep, on average, for an equal amount of time for each DC. This means that the latency of a remote DC should not affect throttling on the source DC.

  • Throttling related information can be found when turning on debug log level for the xdr context.

$ asinfo -v "set-log:id=0;xdr=debug"
ok

The following is an example showing throttling due to going over the defined xdr-max-ship-throughput

Aug 10 2018 02:12:20 GMT: DETAIL (xdr): (xdr_ship.c:1387) Throttling #0 (limit, 495/494, 2 ms)
Aug 10 2018 02:12:20 GMT: DETAIL (xdr): (xdr_ship.c:1387) Throttling #0 (limit, 495/494, 4 ms)
Aug 10 2018 02:12:20 GMT: DETAIL (xdr): (xdr_ship.c:1387) Throttling #2 (limit, 495/494, 32 ms)
Aug 10 2018 02:12:20 GMT: DETAIL (xdr): (xdr_ship.c:1387) Throttling #0 (limit, 495/494, 8 ms)
Aug 10 2018 02:12:20 GMT: DETAIL (xdr): (xdr_ship.c:1387) Throttling #0 (limit, 495/494, 16 ms)
Aug 10 2018 02:12:20 GMT: DETAIL (xdr): (xdr_ship.c:1387) Throttling #3 (limit, 485/494, 2 ms)
Aug 10 2018 02:12:20 GMT: DETAIL (xdr): (xdr_ship.c:1387) Throttling #3 (limit, 485/494, 4 ms)
Aug 10 2018 02:12:20 GMT: DETAIL (xdr): (xdr_ship.c:1387) Throttling #1 (limit, 485/494, 64 ms)
Aug 10 2018 02:12:20 GMT: DETAIL (xdr): (xdr_ship.c:1387) Throttling #3 (limit, 485/494, 8 ms)
Aug 10 2018 02:12:20 GMT: DETAIL (xdr): (xdr_ship.c:1387) Throttling #0 (limit, 485/494, 32 ms)
Aug 10 2018 02:12:20 GMT: DETAIL (xdr): (xdr_ship.c:1387) Throttling #2 (limit, 459/494, 64 ms)
Aug 10 2018 02:12:20 GMT: DETAIL (xdr): (xdr_ship.c:1387) Throttling #3 (limit, 459/494, 16 ms)
Aug 10 2018 02:12:20 GMT: DETAIL (xdr): (xdr_ship.c:1387) Throttling #3 (limit, 442/494, 32 ms)
Aug 10 2018 02:12:20 GMT: DETAIL (xdr): (xdr_ship.c:1387) Throttling #0 (limit, 442/494, 64 ms)
Aug 10 2018 02:12:20 GMT: DETAIL (xdr): (xdr_ship.c:1387) Throttling #3 (limit, 432/494, 64 ms)
Aug 10 2018 02:12:20 GMT: DETAIL (xdr): (xdr_ship.c:1387) Throttling #1 (limit, 430/494, 64 ms)

The above shows:

  • Thread id for the xdr-read-thread (#0, #1, #2 and #3)
  • The current number of inflight records / the maximum allowed.
  • The sleep delay (exponentially backing off: 2, 4, 8, 16, 32, 64 ms).

When shipping to different DCs, the different in flight limits would be different based on the different latencies. For example:

Jul 28 2016 06:53:08 GMT-0400: DEBUG (xdr): (xdr.c:4881) Digest log write with the batch 100
Jul 28 2016 06:53:08 GMT-0400: DEBUG (xdr): (xdr.c:2776) Throttling #2 (limit, 142/141, 16 ms)
Jul 28 2016 06:53:08 GMT-0400: DEBUG (xdr): (xdr.c:2776) Throttling #3 (limit, 142/141, 32 ms)
Jul 28 2016 06:53:08 GMT-0400: DEBUG (xdr): (xdr.c:2776) Throttling #0 (limit, 142/141, 16 ms)
Jul 28 2016 06:53:08 GMT-0400: DEBUG (xdr): (xdr.c:2776) Throttling #2 (limit, 88/87, 16 ms)
Jul 28 2016 06:53:08 GMT-0400: DEBUG (xdr): (xdr.c:2776) Throttling #0 (limit, 88/87, 16 ms)
Jul 28 2016 06:53:08 GMT-0400: DEBUG (xdr): (xdr.c:2776) Throttling #1 (limit, 88/87, 16 ms)
Jul 28 2016 06:53:08 GMT-0400: DEBUG (xdr): (xdr.c:2776) Throttling #0 (limit, 142/141, 16 ms)
Jul 28 2016 06:53:08 GMT-0400: DEBUG (xdr): (xdr.c:2776) Throttling #1 (limit, 88/87, 32 ms)
Jul 28 2016 06:53:08 GMT-0400: DEBUG (xdr): (xdr.c:2776) Throttling #0 (limit, 142/141, 16 ms)

Notes

  • What can cause XDR throttling

Applies To

Server prior to v. 5.0

Keywords

XDR DC THROTTLE LATENCY REMOTE

Timestamp

9/30/18