XDR Slowdown During Migration
When a cluster is shipping to a remote DC via XDR and cluster events have triggered migrations, XDR shipping to the remote DC slows down.
This is an expected consequence of how XDR operates when there is significant
xdr_timelag prior to the cluster event. If XDR encounters a record in the digest log for which the node in question is not the master, it will relog that record at the new master and new replica. In order to avoid overwhelming the nodes in the local cluster, XDR sleeps for a fixed duration of 1ms when doing such relogs. This limits the number of records that can be processed in such situation to 1000 per
xdr-read-thread. The default number of
xdr-read-threads being 4, the throughput of XDR would be limited to 4000 records per second when the records are being relogged to other nodes in the local cluster.
In a situation with minimal
xdr_timelag there should be no perceptible issue in the common case of restarting nodes. When a node is stopped and leaves the cluster, another node would assume masterhood for the partitions the departing node held. When the node rejoins it will not assume the masterhood immediately and so other nodes will accept and ship writes (for the respective partitions).
xdr_timelag on those nodes is high, when the masterhood transitions back to the node that is rejoining the cluster, there will be a slowdown as the other nodes will have to relog the records they hold for the returning node. Due to the 1ms sleep for such relogging, there will be a maximum of 1000 records per
xdr-read-thread that can be relogged per second, which can, indirectly, throttle XDR, if there are a lot such records.
For this reason, shipping could appear to slow down until this backlog of relogged records is cleared through.
There is no solution to this per-se as it is a consequence of how XDR works during cluster membership changes and migration. Ideally, if the cluster events are part of a planned rolling restart it is better to do this when
xdr_timelag is at a lower value. Also, the
xdr_timelag is in general expected to be very low.
- XDR is currently being re-designed and the new architecture will not include the digest log and so this issue is not likely to be present in later Aerospike versions.
- The most common reason for the
xdr_timelagto build up is actually throttling. Refer to the article titled “What Can Cause XDR Throttling” for details.
XDR CLUSTER EVENT MIGRATION SLOWDOWN SHIPPING LAG