Defrag queue fluctuates (~1.1M) after massive record deletion

Evgenii_K · March 6, 2026, 6:45am

Hello,

We are running an Aerospike cluster where we recently observed a large defrag queue (~1.1M write blocks).

This situation appeared after a massive record deletion operation in our application. A large amount of data was removed within a relatively short period of time, which we believe created significant fragmentation in the storage layer.

To help the system recover space we temporarily reduced defrag-sleep to accelerate defragmentation.

The defrag queue continuously fluctuates (increasing and decreasing) around ~1.1M instead of steadily decreasing.

Aerospike version: 6.4

Deployment: Kubernetes

Nodes: 8

Replication factor: 4

namespace PROFILE {

replication-factor 4

memory-size 80G

default-ttl 0

nsup-period 600

nsup-threads 4

write-commit-level-override master

storage-engine device {

    device /dev/sdb1

    max-write-cache 256M

}

current defrag params:

defrag-lwm-pct: 50

defrag-sleep: 500

It looks like the defrag process is not converging after a massive record deletion. The defrag queue stays around ~1.1M

What steps can we take to allow the defragmentation to complete and reclaim space?

Albot · March 21, 2026, 2:47pm

There’s not enough data to concretely say what the case is, but I can make some educated guesses.

Most likely: you’re IO bottlenecked. What is sdb1 (Device type)? Do you know its expected throughput? What does iostat look like, specifically with aqusz/wawait/rawait/rMBs/wMBs?

What did you reduce defrag-sleep from ? 1000?

There are some other possible causes.

Topic		Replies	Views
Defrag not keeping up	30	4840	May 3, 2017
Efficient Defrag (3.4.0) Delivered Requests	2	2325	March 24, 2017
Defragmentation not working as expected Tuning	11	3528	July 30, 2015
Defragmentation Tuning	1	2939	December 22, 2016
Defrag stops working? Configuration	5	2847	July 31, 2015

Defrag queue fluctuates (~1.1M) after massive record deletion

Related topics