Frequent repetition of error message in XDR log after node has been taken down


#1

Frequent repetition of error message in XDR log after node has been taken down

Problem Description

A node in a cluster shipping records via XDR goes down. This could be a planned or unplanned failure. In the XDR logs there are frequent repetitions of the following group of error messages.

Feb 23 2016 09:10:15 GMT: WARNING (cf:rbuffer): (/work/source/modules/ee/xdr/src/rbuffer.c::126) Stats [29068130:28914100:356115:524200:66974]
Feb 23 2016 09:10:15 GMT: WARNING (cf:rbuffer): (/work/source/modules/ee/xdr/src/rbuffer.c::128) Current sptr [288627:0] rptr [289136:100] | rctx [289136:100] | wptr [289141:0] | wctx [289141:100]
Feb 23 2016 09:10:15 GMT: WARNING (cf:rbuffer): (/work/source/modules/ee/xdr/src/rbuffer.c::136) Max Seg = 12493753

Explanation

These error messages indicate that XDR has tried to read a digest that has not yet been flushed to disk. When XDR receives writes from the Aerospike process via named pipe, a writer thread writes these digests to the XDR digest log in batches (size configured by xdr-write-batch-size). When a batch of digests is written to the digest log a pointer indicates the end of the batch. To ship records, XDR reads digests in batches from the digest log (configured with xdr-read-batch-size) before reading these from the Aerospike process as a client. It is possible for the XDR reader thread to read up to the pointer in the digest log placed by the writer before the writer has finished flushing all the incoming digests to disk. In effect, XDR tries to read a digest that does not yet exist on the disk. At that point the error messages above are written to the logs.

The messages only appear when failed node or link down processing is taking place as in normal XDR processing, the buffer used by the writer thread is shared by the reader thread and so the digest does not have to be flushed to disk before it can be read.

Solution

No action is necessary based on these messages as XDR will retry the digest read until the records are flushed. It is useful to be aware that this may happen as there may be a knock on effect to XDR log size and / or external management tools.

Keywords

XDR WARNING SPTR RCTX WPTR

Timestamp

3/3/2016