When the message
Digest Log Write Failed !!! ... Critical error
appears in aerospike.log, it means the XDR digestlog has run out of disk space and shipping of new writes to remote datacenters will stop.
The digestlog starts as a sparse file, taking very few blocks on disk. However, the disk usage will grow every time new entries are needed (that is, when writes happen faster than reclamation) and will never decrease, up to the declared size. (Once this point is reached, the oldest entries will be overwritten by the newest entries and lost.) However, if the digestlog size exceeds the available space in its filesystem, writes to the file will fail and XDR will be unable to log new writes.
To confirm this is the problem, check the size of the file with
ls -lsh. (The standard location of the digestlog file
/opt/aerospike/xdr/digestlog but you can find it with
grep xdr-digestlog-path /etc/aerospike/aerospike.conf.) The
output will look like this:
18G -rw------- 1 root root 18G Feb 1 00:28 /opt/aerospike/xdr/digestlog
The first number is the total allocated on disk; the number before the date is the maximum size. If these are the same,
then the digestlog has reached its maximum size; if not, it’s still trying to grow. You can then check disk usage with
df as usual, and see whether the filesystem is full.
The simplest solution, when possible, is to clean up enough other files from the filesystem that the digestlog can grow o its full size.
If that’s not possible, the next best option is to move the digestlog to a new filesystem, where it will have enough room. To do this,
- Stop the Aerospike service
- Copy the file to the new filesystem using
cp -p --sparse=always /opt/aerospike/xdr/digestlog /mnt/bigpartition/digestlog
- Change the value of
aerospike.confto point to the new digestlog location
- Start the Aerospike service
- Delete the original digestlog
If neither of these is possible, you’ll have to reduce the size of the digestlog. This unfortunately will result in the loss of the existing digestlog entries.
- Change the size specified by
aerospike.confto be slightly smaller than the available space. Since the digestlog will not exceed its configured maximum size, you don’t need to leave much margin for this, but if there are any other files on the same filesystem, be sure to allow for their future growth.
- The metric for this is
xdr_queue_overflow_error. Refer to XDR cannot keep up with writes for additional information.
DIGESTLOG XDR PARTITION SPACE