FAQ - How does Aerospike defragmentation behave with respect to write queues

FAQ - How does Aerospike defragmentation behave with respect to write queues


For each device associated with an Aerospike namespace there is a write queue. The function of the write queue is to allow data written to streaming write buffers (SWBs) to be held so that it can be flushed asynchronously to disk (except when leveraging the commit-to-device feature). The write queues exist in memory and so must be bounded to avoid excessive consumption and, potentially, OOM kills. The max-write-cache configuration parameter provides such boundary. Note that some traffic, such as replica writes and migration writes are not bounded by the max-write-cache.

If the load on the system is very high or if devices are underspecified and failing the write queues can pile up. How does defragmentation behave in this circumstance?


Prior to Aerospike 5.1 each device’s write queue was viewed in isolation. When the max-write-cache limit on the queue was reached the server would report queue too deep in server logs and client writes would fail with Error 18 - device overload. Writes on the write queue coming from replica writes, migration or defragmentation would be added on to the queues regardless of how full they were. This can become problematic when activities that generate large scale deletes are carried out. These activities, such as truncate or migration imply a heavy point load in terms of defragmentation. Outbound migration involves entire partitions being dropped in one fell swoop when partition ownership changes and the impact on the defragmentation queue and subsequently the write queues can be pronounced. Left unchecked this can cause OOM kills unless defragmentation is throttled using defrag-sleep.

From Aerospike 5.1 new functionality was introduced to reduce the impact of defragmentation on the write queues and make the write queues more flexible to point loading. The change (tracked under [AER-6234] - (STORAGE) Added throttling to prevent defrag from overwhelming the write queue) is twofold in nature:

  • max-write-cache used to be a configuration that applies per device. So in a 10 device namespace, as soon as 1 device crossed the max-write-cache configured value, the namespace would start sending queue too deep / device overload errors. In the new implementation, for a 10 device namespace, the threshold is across all devices. This means that queue too deep errors will only occur if the total number of pending SWBs exceeds (10 x max-write-cache). This allows the system to cope with either 1 device with that many pending SWBs, or a situation where all devices are lagging by max-write-cache.

  • In the new implementation defragmentation is allowed to continue until the total write queue is 100 blocks above the configured limit of (# of devices x max-write-cache). At that point, the system stops defrag writes. Replica writes and migration writes (if any) will still continue. The system checks at defrag-sleep intervals whether it is still above the (100 + number of devices x max-write-cache). As soon it is back below this threshold, defrag writes are resumed but client writes will still be rejected, until the queue is back down to the (number of devices x max-write-cache).


{namespace_name} /dev/sda: used-bytes 296160983424 free-wblocks 885103 write-q 0 write (12659541,43.3) defrag-q 0 defrag-read (11936852,39.1) defrag-write (3586533,10.2) shadow-write-q 0 tomb-raider-read (13758,598.0)




November 2020

© 2021 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.