Defragmentation

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

Aerospike Defragmentation

Aerospike always writes data to storage-engine device namespaces in large blocks of size write-block-size. Each block will be filled with incoming write transactions and will be written to the device:

  • When the swb (streaming write buffer of size write-block-size) is full (or when the next record to be written doesn’t fit).
  • When the swb has not been flushed for flush-max-ms milliseconds (default 1 second).
  • On every write transaction when configured through the commit-to-device for strong-consistency enabled namespaces.

Note: written blocks can still linger in caches – the page cache (if using files) and the hardware cache. The fsync-max-sec configuration parameter controls the frequency at which data gets pushed from those caches.

As records get updated or deleted, blocks on disk will see their active records capacity decrease. When a block usage level falls below the defrag-lwm-pct, it becomes eligible for defragmentation and is queued up on the defrag-q (defragmentation queue). The default value of defrag-lwm-pct is 50%.

There are four configuration parameters that can be tuned for the defragmentation sub system. They can be set dynamically or in the aerospike.conf file. For those values to be persisted upon restart, they should be set in the configuration file.

  1. defrag-lwm-pct: Default is 50%. Blocks that are less filled in percentage than the specified limit will be marked as eligible to be defragmented. A higher percentage means more blocks to be defragmented, and more dense data on the disk. The value for 50% is chosen as it gives a good balance between space usage and write amplification. For a given use case it may be desirable to increase defrag-lwm-pct and gain more usable space on the disk. In such instances, usually ones where the workload is read heavy, write-amplification may be less of a factor. This should be tested, particularly to observe the effect on defrag load during operations which generate lots of deletes such as truncation or partitions dropping during migration.

  2. defrag-sleep: Default is 1000 microseconds. This is the number of microseconds to sleep after each wblock defragged.

  3. defrag-startup-minimum which defaults to 10%. Per namespace, if 10% of the disk is not writable then the server will not join the cluster or open a service port. Remember that the disk might appear full to Aerospike as it writes everything in blocks so running df or similar command is not relevant except for knowing the physical space being used up. The actual writable space available on the namespace is represented by device_free_pct

  4. defrag-queue-min: Default is 0 and it indicates not to defrag unless the defrag-queue has this many eligible wblocks.

The log lines also indicate the defrag profile:

{namespace} /dev/sda: used-bytes 296160983424 free-wblocks 885103 write-q 0 write (12659541,43.3) defrag-q 0 defrag-read (11936852,39.1) defrag-write (3586533,10.2) shadow-write-q 0 tomb-raider-read (13758,598.0)

The details for each parameters are described in the log reference manual. As of version 4.3, device statistics are also available:

The following formula can typically be used to determine if the defragmentation is keeping up:

100 - (device_available_pct + [100 * device_used_bytes / device_total_bytes] )

If the value falls between 0 and 20, it indicates that defrag is keeping up. If the value is above 30 this may indicate that the defrag is not keeping up. The next step would be looking at the above line in the logs. You will want to look at write and defrag-write. In the above log line, the writes per sec are greater than the defrag writes (note that the writes per sec include the defrag writes per second). Initially, this may not pose a problem but over a period of time, you may be running low on device_available_pct. You may also want to monitor the defrag-q which should not be constantly increasing. If you determine the node is falling behind and the logs show an empty defrag queue, consider raising the defrag-lwm-pct modestly. Please be aware that raising the defrag-lwm-pct will have a nonlinear write amplification.

You can look at the Aerospike log file and focus on the write rate and defrag rate.

tail -f /var/log/aerospike/aerospike.log | grep defrag-write

Keywords

DEFRAG DEFRAGMENTATION

Timestamp

Sept 11 2018

2 Likes

@Aerospike_Knowledge Hello, I have a question. What happens when the disk usage is 60% but we only have 4.5% of continuous space? Does exist a method to defragment the disks using aerospike? INFO: write-block-size 8M replication-factor 2

MicrosoftTeams-image (2)

Likely your lwm is set too low. Is it set to at least 60% (or whatever your disk-used is)? If there is no defrag-queue, trying increasing the lwm… if there is a high defrag-q maybe lower defrag-sleep, or that you are IO bound. maybe even that post-write-queue has locked up too many blocks to prevent defragging, but lowering that could have bad perf impact. Increasing lwm can be dangerous. Be careful

1 Like