FAQ - Why is high-water-disk-pct set to 50%?

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

FAQ - Aerospike Defragmentation and High Water Marks

Detail

Note: as of version 4.9, the default for nsup-period changed to 0. This means that, by default, eviction and expiration are disabled. The general guidelines for using less than 50% of a storage device remains (other than for specific low write throughput workloads).

By default, Aerospike sets high-water-disk-pct and defrag-lwm-pct to 50% and , why are these set so low? Can they be increased to derive more value from the given hardware?

Answer

Aerospike manages defragmentation using a number of parameters, one of these is defrag-lwm-pct which defaults to 50%. Initially full blocks are written, and as time goes by records are updated and/or deleted and gaps appear in the blocks on the storage device. With a defrag-lwm-pct of 50%, blocks will be sent for defragmentation when they are 50% (or less) occupied. Assuming there are no backed-up defragmentation queues, the 50% of data that remains will be rewritten in a fresh block and combined with the 50% data remaining on another block will provide one full block and 2 fresh empty blocks. This yields a 2X write amplification.

To ensure this contiguous free space exists, the high-water-disk-pct is set to 50%. This is of course only applicable for namespaces were records have a non-null TTL set, to enforce the disk usage to stay below 50% (due to evictions). Therefore the space is not being wasted, it is being used as part of the defragmentation process, to ensure a 2X write amplification. If this space is reduced, it is possible that defragmentation will not be able to happen before the device-avail-pct drops to a level where stop-writes is triggered.

If expiration and eviction are not in use, as is the default, the operator is responsible for ensuring that sufficient free space is available on the disk for defragmentation. This free space must be empty blocks as Aerospike only writes to empty blocks.

When expiration is switched on, Aerospike recommends that for most circumstances running high-water-disk-pct and defrag-lwm-pct at default values (or keeping used disk under the configured defrag-lwm-pct.) As this offers the best balance between resource usage and disk longevity. This is not a hard and fast rule and these parameters can be tuned according to workload. The key thing here is to not use more disk than the defrag-lwm-pct, this is independent of evictions and by default is 50% with write a amplication of 2x.

Although the default value for defrag-lwm-pct is optimal in many circumstances, it may be that there are use cases where this can be tuned without issue. For a read heavy workload it may well be appropriate for the defrag-lwm-pct to be increased and with it the amount of disk available for active data storage. If that is the case, careful testing should be carried out to ensure that activities which generate high levels of deletes do not overwhelm the defragmentation sub-system. Examples of these activities would be:

  • Truncation of data
  • Migration where outbound partitions are dropped after migration
  • Expiration if in use
  • Deletion of tombstones when the tombraider process runs

Increasing defrag-lwm-pct increases write amplification which, in turn, increases disk wear. In the use case is predominately reads this may be perfectly acceptable, but it should be understood. An increase of defrag-lwm-pct from 50% to 75% would increase write amplification from 2x to 4x. With defrag-lwm-pct set to 50% 2 blocks are required to obtain a single empty block. When defrag-lwm-pct is set to 75% 4 blocks are required to obtain a single empty block.

Notes

  • It is possible to run with a higher value for high-water-disk-pct (or use more disk if one does not want to leverage evictions) but this would require to also increase the defrag-lwm-pct. As an example, increasing defrag-lwm-pct to 75% means that blocks will be sent for defragmentation when their occupancy is 75% or lower. This implies that to recover one block of free space, 4 blocks will need to be sent for defragmentation (with 75% data on each) but defragmentation will happen much more aggressively. The consequence to this is increased write amplification to 4X which may impact application transactions latencies and, ultimately increased disk wear.

  • The amplification can be calculated based on the server logs at the per-namespace level. See the log reference for details.

{namespace} /dev/sda: used-bytes 296160983424 free-wblocks 885103 write-q 0 write (12659541,43.3) defrag-q 0 defrag-read (11936852,39.1)
defrag-write (3586533,10.2) tomb-raider-read (13758,598.0)
  • To illustrate in simple terms, the danger associated with increasing high-water-disk-pct (or using more disk in use cases where evictions are not enforced) without changing defrag-lwm-pct, consider a scenario where high-water-disk-pct has been increased to 70% and defrag-lwm-pct has not been changed (i.e. it is at the default 50%). It is in theory possible for every disk block to be 70% full. No evictions will happen as high-water-disk-pct has not been breached. No defragmentation will happen as no block will have fallen below the defrag-lwm-pct of 50%. The device-avail-pct will be 0 and the namespace will be in stop-writes as there are no empty blocks that can be written to.

  • More details on tuning defragmentation can be found on this Defragmentation knowledge base article.

  • General details on the Write Amplification phenomenon.

Keywords

DEFRAG HIGH-WATER-DISK-PCT DEFRAG-LWM-PCT WRITE AMPLIFICATION

Timestamp

10/05/17

2 Likes