Aerospike Defragmentation

Note: This document describes how defragmentation works in Aerospike Server build 3.3.17 and later. Prior releases used a different algorithm.

Aerospike always writes data in large blocks of size write-block-size (typically 128KB for Flash Storage devices). Each block will be filled with incoming write transactions and will be flushed to the device when full (or when the next record to be written doesn’t fit anymore). Therefore, initially, each block will be filled up near its full capacity. As records get updated, or deleted (potentially through the nsup (Namespace Supervisor) thread), blocks on disk will see their active records capacity decrease. When a block usage level falls below the defrag-lwm-pct (Defragmentation Low Water Mark Percentage), it becomes eligible for defragmentation and is queued up on the defragmentation queue (defrag q in the logs). The default value of defrag-lwm-pct is 50%.

There are three configurable values that you need to consider in regards to defrag and they can be set dynamically or in the aerospike.conf file. If you want these values to be persistent then they should be set in the configuration file or they might be lost in case of a daemon restart. Lets take a closer look at these values for a second.

First we have the defrag-lwm-pct which defaults to 50%. This means that blocks that are less than 50% occupied will be queued for defragmentation.

Second we have defrag-sleep which defaults to 1000 microseconds. This is the wait defrag will do after defragging one block before moving to the next one.

Third we have defrag-startup-minimum which defaults to 10%. If 10% is not available the server will not join the cluster or open a service port. Remember that the disk might appear full to Aerospike as it writes everything in blocks so running df or similar command is not relevant except for knowing the physical space being used up.

If you suspect there might be a problem the Aerospike tool asmonitor is your best friend. Lets have a quick look at what this tool show us.

If you run asmonitor and then type info you will find Avail pct and Free Disk pct. These values tell you how much space you have left.

In Aerospike server 3.3.22+ we have a new tool called asadm. Run this tool and type info and you will see Avail% and Disk Used% which also tells you how much space you have left.

In case you need more information you will have to go to the logs which you will find at /var/log/aerospike/aerospike.log

Look for a line like this one:

device /dev/sdc: used 296160983424, contig-free 110637M (885103 wblocks), swb-free 16,
n-w 0, w-q 0 w-tot 12659541 (43.3/s), defrag-q 0 defrag-tot 11936852 (39.1/s)

Let us analyze this line for a second.

device: Name of the device for which the these stats apply

used: Number of bytes in use on this device

contig-free: Amount of available space for write operations using your configured block size. wblocks (found within the parenthesis) is the number of blocks available for write operations.

swb-free: Number of free streaming write buffers. Once a buffer is full is flushes the data to the disk.

n-w: Number of threads concurrently flushing to the ssd write block buffer (swb)

w-q: Number of write buffers pending to be flushed to the SSD

w-tot: Total number of streaming write buffers ever flushed to this device, and the number of write buffers written per second in parenthesis

defrag-q: Number of wblocks pending defrag

defrag-tot: Total number of write blocks ever processed by defragmentation on this device, and the number of wblocks processed per second in parenthesis

Now when you have a better understanding of this lets talk about what you should pay attention to.

If you suspect defrag is falling behind you can use the following simple formula, 100 - (Avail pct + Disk Used pct). Should this fall between 0 and 20, defrag is keeping up. If this comes in above 30 you might be falling behind and it is time to check the logs. You will want to look at write rate and defrag rate. If you look at the log line I displayed above you will find two averages displayed like so (x.x/s). The first one in our example is (43.3/s),which is wblocks written per second. The second one in our example is (39.1/s), which is wblocks defragged per second. As you can see the write rate is higher than the defrag rate. Initially this is nothing to worry about but if it is consistent over a period of time then measures might need to be taken. If you determine the node is falling behind and the logs show an empty defrag queue, consider raising the defrag-lwm-pct modestly. Please be aware that raising the defrag-lwm-pct will have a nonlinear write amplification.

You can look at the Aerospike log file and focus on the write rate and defrag rate.

tail -f /var/log/aerospike/aerospike.log | grep —color drv_ssd

Understanding when server no longer accepts writes
How the defragmentation working?
When Aerospike Community version evict expired record from disk?
Recovering from Available Percent Zero
What is the difference between device_available_pct and device_free_pct
Incorrect disk free+used result in summary stats
Bulk/Batch Updates