FAQ - Why is high-water-disk-pct set to 50%?

FAQ - Why is high-water-disk-pct set to 50%?

Detail

By default, Aerospike sets high-water-disk-pct to 50%, why is it set so low? Can it be increased to derive more value from the given hardware?

Answer

Aerospike manages defragmentation using a number of parameters, one of these is defrag-lwm-pct which defaults to 50%. Initially full blocks are written, and as time goes by records are updated and/or deleted and gaps appear in the blocks on the storage device. With a defrag-lwm-pct of 50%, blocks will be sent for defragmentation when they are 50% (or less) occupied. Assuming no backed-up defragmentation queues, the 50% of data that remains will be rewritten in a fresh block and combined with the 50% data remaining on another block will provide one full block and 2 fresh empty blocks. This yields a 2X write amplification.

To ensure this contiguous free space exists, the high-water-disk-pct is set to 50%. This is of course only applicable for namespaces were records have a non-null TTL set, to enforce the disk usage to stay below 50% (due to evictions). Therefore the space is not being wasted, it is being used as part of the defragmentation process, to ensure a 2X write amplification. If this space is reduced, it is possible that defragmentation will not be able to happen before the device-avail-pct drops to a level where stop-writes is triggered.

Aerospike recommends that for most circumstances running high-water-disk-pct and defrag-lwm-pct at default values (or keeping used disk under the configured defrag-lwm-pct) offers the best balance between resource usage and disk longevity. Specific workloads with lower write throughput can of course be configured differently and make use of more space as the defragmentation activity will decrease.

Notes

  • It is possible to run with a higher value for high-water-disk-pct (or use more disk if one does not want to leverage evictions) but this would require to also increase the defrag-lwm-pct. As an example, increasing defrag-lwm-pct to 75% means that blocks will be sent for defragmentation when their occupancy is 75% or lower. This implies that to recover one block of free space, 4 blocks will need to be sent for defragmentation (with 75% data on each) but defragmentation will happen much more aggressively. The consequence to this is increased write amplification to 4X which may impact application transactions latencies and, ultimately increased disk wear.

  • The amplification can be calculated based on the server logs at the per-namespace level. See the log reference for details.

{namespace} /dev/sda: used-bytes 296160983424 free-wblocks 885103 write-q 0 write (12659541,43.3) defrag-q 0 defrag-read (11936852,39.1)
defrag-write (3586533,10.2) tomb-raider-read (13758,598.0)
  • To illustrate in simple terms, the danger associated with increasing high-water-disk-pct (or using more disk in use cases where evictions are not enforced) without changing defrag-lwm-pct, consider a scenario where high-water-disk-pct has been increased to 70% and defrag-lwm-pct has not been changed (i.e. it is at the default 50%). It is in theory possible for every disk block to be 70% full. No evictions will happen as high-water-disk-pct has not been breached. No defragmentation will happen as no block will have fallen below the defrag-lwm-pct of 50%. The device-avail-pct will be 0 and the namespace will be in stop-writes as there are no empty blocks that can be written to.

Keywords

DEFRAG HIGH-WATER-DISK-PCT DEFRAG-LWM-PCT WRITE AMPLIFICATION

Timestamp

10/05/17