FAQ - Why is high-water-disk-pct set to 50%?


#1

FAQ - Why is high-water-disk-pct set to 50%?

Detail

By default, Aerospike sets high-water-disk-pct to 50%, why is it set so low? Can it be increased to derive more value from the given hardware?

Answer

Aerospike manages defragmentation using a number of parameters, one of these is defrag-lwm-pct which defaults to 50%. Initially full blocks are written, and as time goes by records are updated and/or deleted and gaps appear in the blocks on the storage device. With a defrag-lwm-pct of 50%, blocks will be sent for defragmentation when they are 50% (or less) occupied. Assuming no backed-up defragmentation queues, the 50% of data that remains will be rewritten in a fresh block and combined with the 50% data remaining on another block will provide one full block and 2 fresh empty blocks. This yields a 2X write amplification.

To ensure this contiguous free space exists, the high-water-disk-pct is set to 50%. This is of course only applicable for namespaces were records have a non-null TTL set, to enforce the disk usage to stay below 50% (due to evictions). Therefore the space is not being wasted, it is being used as part of the defragmentation process, to ensure a 2X write amplification. If this space is reduced, it is possible that defragmentation will not be able to happen before the avail-pct drops to a level where stop-writes is triggered.

Aerospike recommends that for most circumstances running high-water-disk-pct offers the best balance between resource usage and disk longevity.

Notes

  • It is possible to run with a higher value for high-water-disk-pct but this would require to also increase the defrag-lwm-pct. As an example, increasing defrag-lwm-pct to 75% means that blocks will be sent for defragmentation when their occupancy is 75% or lower. This implies that to recover one block of free space, 4 blocks will need to be sent for defragmentation (with 75% data on each) but defragmentation will happen much more aggressively. The consequence to this is increased write amplification to 4X which may impact application transactions latencies and, ultimately increased disk wear.

  • The amplification can be calculated based on the server logs at the per-namespace level. See the log reference for details.

{namespace} /dev/sda: used-bytes 296160983424 free-wblocks 885103 write-q 0 write (12659541,43.3) defrag-q 0 defrag-read (11936852,39.1)
defrag-write (3586533,10.2) tomb-raider-read (13758,598.0)

Keywords

DEFRAG HIGH-WATER-DISK-PCT DEFRAG-LWM-PCT WRITE AMPLIFICATION

Timestamp

10/05/17


Bulk/Batch Updates
Some aerospike.conf defaults explained
Memory errors
What is the difference between device_available_pct and device_free_pct