Defragmentation


#1

Hello,

Aerospike Community Edition build 3.10.0.3

Config:

write-block-size 128K 
defrag-sleep 0 
defrag-lwm-pct 80

From Aerospike log, I am seeing:

Dec 21 2016 22:58:58 GMT: INFO (drv_ssd): (drv_ssd.c:2093) {Cache} /dev/xvdc: used-bytes 1633063296 free-wblocks 1237274 write-q 0 write (34322,84.8) defrag-q 0 defrag-read (21718,74.6) defrag-write (17354,59.5)
Dec 21 2016 22:58:58 GMT: INFO (drv_ssd): (drv_ssd.c:2093) {Cache} /dev/xvdb: used-bytes 1633939456 free-wblocks 1237267 write-q 0 write (34351,85.8) defrag-q 0 defrag-read (21740,75.9) defrag-write (17372,60.7)
Dec 21 2016 22:59:18 GMT: INFO (drv_ssd): (drv_ssd.c:2093) {Cache} /dev/xvdb: used-bytes 1664761600 free-wblocks 1237063 write-q 0 write (36027,83.8) defrag-q 0 defrag-read (23212,73.6) defrag-write (18548,58.8)
Dec 21 2016 22:59:38 GMT: INFO (drv_ssd): (drv_ssd.c:2093) {Cache} /dev/xvdc: used-bytes 1694924160 free-wblocks 1236866 write-q 0 write (37653,82.9) defrag-q 0 defrag-read (24641,72.6) defrag-write (19690,58.0)

Question 1: What is defrag-read and defrag-write?

Question 2: Do I add defrag-read and defrag-write and compare with write to check if write is faster than the defrag process?

Question 3: From the above log, it seems like I am writing more than Aerospike can defrag, any thing I can do to avoid this?

Thanks.


#2

Generally, you would like to try to keep defrag-lwm-pct at 50% if possible. The number of disk writes are non-linearly amplified w.r.t this configuration - the amplification can be plotted with 1/(1-n/100) for n = 0 to 100.

defrag-read is the number of blocks the defrag thread has read and the rate at which it is reading them.

defrag-write is the number of blocks the defrag thread has written and the rate at which it is writing them.

In your case defrag has read ~73 wblock and written ~58. So ~80% of the data read needs to be rewritten which is what you have configured.

writes is the number of block written and the rate at which they are being written (defrag or otherwise).

In your case defrag accounts for about 70% (58/83) of your writes.

The defrag-q is 0, this is the number of wblocks that are less than 80% (defrag-lwm-pct) full that haven’t yet been processed by defrag - defrag is keeping up.

Based on the data, your disks are ~90% full, which isn’t a very good place to be in the event a node fails and data needs to redistribute. Based on this time slice I would conclude you either need to increase capacity or purge some data. Also you may want configure paxos-single-replica-limit to automatically drop to replication-factor 1 if you lose a node while you are in this situation (basically dynamically and statically configuring it to one less than the cluster size).