Ran out of queue space... XDR cannot keep up with write


#1

Problem Description

Aerospike log file contains this warning:

WARNING (xdr): (xdr.c:5021) Ran out of queue space... XDR cannot keep up with write .. some records may be lost!!!

Explanation

This is an internal in-memory queue used to batch digest log entries before persisting them on disk. It has a size of 1,000,000 and when it is full, it will print out the above message.

Here are the three common situations that would cause this:

  1. The filesystem partition is full so the digest log is not able to expand as it is a sparse file. You will also see errors like this:

    WARNING (xdr): (xdr.c:4887) Digest Log Write Failed !!! ... Critical error
    
  2. The disk is slow so it fill the internal queue faster than it can write to the digest log.

  3. The last case is a bug (AER-5617), which was addressed in Enterprise Edition release 3.14.1.1.

When XDR is enabled but the remote DCs are all INACTIVE, XDR does not reclaim processed entries from the digest log, causing it to grow until it reaches its full size (and start overwriting older records). As the digest log grows, the internal logic to figure out the last ship time can takes longer, and, as it happens under a lock, it prevents new entries from being flushed to disk. If the load is such that enough digest log entries are populated in this internal queue and it reaches the limit, this WARNING message will be triggered (the ul is 1000101 in the example).

Apr 24 2017 10:00:26 GMT-0700: INFO (xdr): (xdr.c:2027) sh 0 : ul 65 : lg 7516764045 : rlg 0 : lproc 7516764000 : rproc 479081584 : lkdproc 0 : errcl 0 : errsrv 0 : hkskip 1250365174 745690834 : flat 0
Apr 24 2017 10:02:25 GMT-0700: INFO (xdr): (xdr.c:1773) Reclaimed 0 records space in digest log...
Apr 24 2017 10:02:25 GMT-0700: INFO (xdr): (xdr.c:2027) sh 0 : ul 1000101 : lg 7516889145 : rlg 0 : lproc 7516889100 : rproc 479081584 : lkdproc 0 : errcl 0 : errsrv 0 : hkskip 1250365641 745691255 : flat 0

Solution

  1. Refer to the following article on handling digestlog partition out of space situations:

2. Disable XDR completely or ensure at least one DC is ACTIVE.

Notes

  • The digest log is a implemented as a circular buffer and will overwrite old records.

Keywords

XDR DIGESTLOG

Timestamp

5/11/2017


XDR ran out of queue space