Defragmentation


#1

Aerospike Defragmentation

Aerospike always writes data to storage-engine device namespaces in large blocks of size write-block-size. Each block will be filled with incoming write transactions and will be written to the device:

  • When the swb (streaming write buffer of size write-block-size) is full (or when the next record to be written doesn’t fit).
  • When the swb has not been flushed for flush-max-ms milliseconds (default 1 second).
  • On every write transaction when configured through the commit-to-device for strong-consistency enabled namespaces.

Note: written blocks can still linger in caches – the page cache (if using files) and the hardware cache. The fsync-max-sec configuration parameter controls the frequency at which data gets pushed from those caches.

As records get updated or deleted, blocks on disk will see their active records capacity decrease. When a block usage level falls below the defrag-lwm-pct, it becomes eligible for defragmentation and is queued up on the defrag-q (defragmentation queue). The default value of defrag-lwm-pct is 50%.

There are four configuration parameters that can be tuned for the defragmentation sub system. They can be set dynamically or in the aerospike.conf file. For those values to be persisted upon restart, they should be set in the configuration file.

  1. defrag-lwm-pct: Default is 50%. Blocks that are less filled in percentage than the specified limit will be marked as eligible to be defragmented. A higher percentage means more blocks to be defragmented, and more dense data on the disk.

  2. defrag-sleep: Default is 1000 microseconds. This is the number of microseconds to sleep after each wblock defragged.

  3. defrag-startup-minimum which defaults to 10%. Per namespace, if 10% of the disk is not writable then the server will not join the cluster or open a service port. Remember that the disk might appear full to Aerospike as it writes everything in blocks so running df or similar command is not relevant except for knowing the physical space being used up. The actual writable space available on the namespace is represented by device_free_pct

  4. defrag-queue-min: Default is 0 and it indicates not to defrag unless the defrag-queue has this many eligible wblocks.

The log lines also indicate the defrag profile:

{namespace} /dev/sda: used-bytes 296160983424 free-wblocks 885103 write-q 0 write (12659541,43.3) defrag-q 0 defrag-read (11936852,39.1) defrag-write (3586533,10.2) shadow-write-q 0 tomb-raider-read (13758,598.0)

The details for each parameters are described in the log reference manual. As of version 4.3, device statistics are also available:

The following formula can typically be used to determine if the defragmentation is keeping up:

100 - (device_available_pct + [100 * device_used_bytes / device_total_bytes] )

If the value falls between 0 and 20, it indicates that defrag is keeping up. If the value is above 30 this may indicate that the defrag is not keeping up. The next step would be looking at the above line in the logs. You will want to look at write and defrag-write. In the above log line, the writes per sec are greater than the defrag writes (note that the writes per sec include the defrag writes per second). Initially, this may not pose a problem but over a period of time, you may be running low on device_available_pct. You may also want to monitor the defrag-q which should not be constantly increasing. If you determine the node is falling behind and the logs show an empty defrag queue, consider raising the defrag-lwm-pct modestly. Please be aware that raising the defrag-lwm-pct will have a nonlinear write amplification.

You can look at the Aerospike log file and focus on the write rate and defrag rate.

tail -f /var/log/aerospike/aerospike.log | grep defrag-write

Keywords

DEFRAG DEFRAGMENTATION

Timestamp

Sept 11 2018


Understanding when server no longer accepts writes
How the defragmentation working?
When Aerospike Community version evict expired record from disk?
Incorrect disk free+used result in summary stats
What is the difference between device_available_pct and device_free_pct
How to recover contiguous free blocks aka available percent
Does HWM influence SSD capacity?
Recovering from Available Percent Zero
Delete Expired Objects
Bulk/Batch Updates