The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.
FAQ - Aerospike Defragmentation and High Water Marks
Detail
Note: as of version 4.9, the default for nsup-period changed to 0. This means that, by default, eviction and expiration are disabled. The general guidelines for using less than 50% of a storage device remains (other than for specific low write throughput workloads).
By default, Aerospike sets high-water-disk-pct
and defrag-lwm-pct
to 50% and , why are these set so low? Can they be increased to derive more value from the given hardware?
Answer
Aerospike manages defragmentation using a number of parameters, one of these is defrag-lwm-pct
which defaults to 50%. Initially full blocks are written, and as time goes by records are updated and/or deleted and gaps appear in the blocks on the storage device. With a defrag-lwm-pct
of 50%, blocks will be sent for defragmentation when they are 50% (or less) occupied. Assuming there are no backed-up defragmentation queues, the 50% of data that remains will be rewritten in a fresh block and combined with the 50% data remaining on another block will provide one full block and 2 fresh empty blocks. This yields a 2X write amplification.
To ensure this contiguous free space exists, the high-water-disk-pct
is set to 50%. This is of course only applicable for namespaces were records have a non-null TTL set, to enforce the disk usage to stay below 50% (due to evictions). Therefore the space is not being wasted, it is being used as part of the defragmentation process, to ensure a 2X write amplification. If this space is reduced, it is possible that defragmentation will not be able to happen before the device-avail-pct
drops to a level where stop-writes
is triggered.
If expiration and eviction are not in use, as is the default, the operator is responsible for ensuring that sufficient free space is available on the disk for defragmentation. This free space must be empty blocks as Aerospike only writes to empty blocks.
When expiration is switched on, Aerospike recommends that for most circumstances running high-water-disk-pct
and defrag-lwm-pct
at default values (or keeping used disk under the configured defrag-lwm-pct.
) As this offers the best balance between resource usage and disk longevity. This is not a hard and fast rule and these parameters can be tuned according to workload.
The key thing here is to not use more disk than the defrag-lwm-pct
, this is independent of evictions and by default is 50% with write a amplication of 2x.
Although the default value for defrag-lwm-pct
is optimal in many circumstances, it may be that there are use cases where this can be tuned without issue. For a read heavy workload it may well be appropriate for the defrag-lwm-pct
to be increased and with it the amount of disk available for active data storage. If that is the case, careful testing should be carried out to ensure that activities which generate high levels of deletes do not overwhelm the defragmentation sub-system. Examples of these activities would be:
- Truncation of data
- Migration where outbound partitions are dropped after migration
- Expiration if in use
- Deletion of tombstones when the tombraider process runs
Increasing defrag-lwm-pct
increases write amplification which, in turn, increases disk wear. In the use case is predominately reads this may be perfectly acceptable, but it should be understood. An increase of defrag-lwm-pct
from 50% to 75% would increase write amplification from 2x to 4x. With defrag-lwm-pct
set to 50% 2 blocks are required to obtain a single empty block. When defrag-lwm-pct
is set to 75% 4 blocks are required to obtain a single empty block.
Notes
-
It is possible to run with a higher value for
high-water-disk-pct
(or use more disk if one does not want to leverage evictions) but this would require to also increase thedefrag-lwm-pct
. As an example, increasingdefrag-lwm-pct
to 75% means that blocks will be sent for defragmentation when their occupancy is 75% or lower. This implies that to recover one block of free space, 4 blocks will need to be sent for defragmentation (with 75% data on each) but defragmentation will happen much more aggressively. The consequence to this is increased write amplification to 4X which may impact application transactions latencies and, ultimately increased disk wear. -
The amplification can be calculated based on the server logs at the per-namespace level. See the log reference for details.
{namespace} /dev/sda: used-bytes 296160983424 free-wblocks 885103 write-q 0 write (12659541,43.3) defrag-q 0 defrag-read (11936852,39.1)
defrag-write (3586533,10.2) tomb-raider-read (13758,598.0)
-
To illustrate in simple terms, the danger associated with increasing
high-water-disk-pct
(or using more disk in use cases where evictions are not enforced) without changingdefrag-lwm-pct
, consider a scenario wherehigh-water-disk-pct
has been increased to 70% anddefrag-lwm-pct
has not been changed (i.e. it is at the default 50%). It is in theory possible for every disk block to be 70% full. No evictions will happen ashigh-water-disk-pct
has not been breached. No defragmentation will happen as no block will have fallen below thedefrag-lwm-pct
of 50%. Thedevice-avail-pct
will be 0 and the namespace will be instop-writes
as there are no empty blocks that can be written to. -
More details on tuning defragmentation can be found on this Defragmentation knowledge base article.
-
General details on the Write Amplification phenomenon.
Keywords
DEFRAG HIGH-WATER-DISK-PCT DEFRAG-LWM-PCT WRITE AMPLIFICATION
Timestamp
10/05/17