From the log snippet, seems you are writing faster than the defrag process could reclaim blocks,
Whats the write-block-size you are using ? What kind of work load you are having Read/Write TPS ?
Setting the defrag-lwm-pct higher is probably counter productive. You should be fine around 50-55%, setting higher will cause defrag to have to move more data with less benefit.
By default, between each wblock defragged Aerospike sleeps 1000 micro-seconds before continuing to the next wblock. If you need to process the queue faster we need to reduce this sleep. To do this adjust the defrag-sleep parameter. You could try setting this sleep to 0 and see if defrag begins to catch up (if it cannot catch up at 0 we will need to try another approach). If you start having performance issues with the defrag-sleep set to 0, you will need to increase it and monitor avail_pct to ensure that defrag doesn’t fall behind with the increased setting.
As I wrote above, defrag-lwm-pct is set to 80. Set defrag-sleep (yes, I checked the docs before asking questions ) to 0 didn’t resolve the issue. I can roll defrag-lwm-pct to 99 value but it’s scary me.
Setting defrag-lwm-pct to 80 and especially 99 is going to be counter productive to your situation. I would recommend lowering it to around 50-55%. Otherwise you are going to drastically and unnecessarily increase write amplification from defrag.
Had you tried setting defrag-sleep to 0 and having defrag-lwm-pct at 50%?
Two small disks hold (via linux mdraid) system image. Also hold swap partition. Free space (~190GB) on both disks is configured as raw partitions for Aerospike. So I have 4 slices that are used by Aerospike: 2x190GB, 2x447GB.
I suppose that defragmentation logic that is currently implemented in Aerospike code scans only first half of large disks (480GB) to determine if any blocks happened to be processed.
This isn’t how to defrag algorithm works. An older algorithm scanned the disks, did you find reference to that in the docs? The current algorithm processes a queue containing defrag eligible wblocks. When an update or delete occurs and a wblock becomes defrag eligible it is immediately added to the defrag queue.
Aerospike expects that all the disks attached to a namespace are the same size. You will only be able to use 190GB from each disk.
I do not have ACT Benchmark numbers for these drives but based on:
Assuming this is aggregate across the 7 nodes, you are doing ~14K read and ~6K writes ops per node per second. Using the ACT benchmark utility this is a 6x test, have you used this benchmark to determine if your hardware can handle the load you are providing?
I only made an assumption how it may works based on how it does. If it becomes an issue I may go through the code to find out what logic is implemented (but prefer not to do it).