Avail-pct drops without defragmentation starting


#1

Avail-pct drops without defragmentation starting

Problem Description

On a small namespace (less than 256Mb) stored on a filesystem, the avail-pct continues to drop but when logs are checked, the defrag-q is empty and therefore the defragger is not running. Disk used % is not excessive. The info output from asadm looks as follows:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Namespace                                       Node   Avail%   Evictions      Master     Replica     Repl     Stop     Pending         Disk    Disk     HWM          Mem     Mem    HWM      Stop   
                    .                                          .        .           .     Objects     Objects   Factor   Writes    Migrates         Used   Used%   Disk%         Used   Used%   Mem%   Writes%   
                    .                                          .        .           .           .           .        .        .   (tx%,rx%)            .       .       .            .       .      .         .   

Namespace__Storage         host1:3000                           33               0     1.000       4.000     2        false    (0,0)         1.250 KB   1       50       18.479 KB   1       60     90        
Namespace__Storage         host2:3000                           53               0     3.000       0.000     2        false    (0,0)       768.000 B    1       50       18.276 KB   1       60     90        
Namespace__Storage         host3:3000                           43               0     2.000       2.000     2        false    (0,0)         1.000 KB   1       50       18.339 KB   1       60     90        
Namespace__Storage                                                               0     6.000       6.000                       (0,0)         3.000 KB                    55.095 KB                            

The output above shows the that the namespace has a small number of very small objects and yet, avail-pct is very low, but Disk Used% is also very low. The configuration of the namespace is as follows:

namespace Namespace__Storage {
memory-size 100M
replication-factor 2
default-ttl 0
high-water-memory-pct 60
high-water-disk-pct 50
stop-writes-pct 90
storage-engine device {
	file /opt/aerospike/Namespace__Storage.dat
	filesize 100M
	data-in-memory false
	}
}

Looking into the logs, the defrag-q is empty and contig-free is dropping:

[Caprica6:5477 user$ grep -i defrag-q aerospike.log | grep Namespace__Storage | more
May 16 2016 11:53:38 GMT: INFO (drv_ssd): (drv_ssd.c:1094) /opt/aerospike/Namespace__Storage.dat init wblock free-q 99, defrag-q 0
May 16 2016 11:53:58 GMT: INFO (drv_ssd): (drv_ssd.c:2086) device /opt/aerospike/Namespace__Storage.dat: used 0, contig-free 32M (32 wblocks), swb-free 0, w-q 0 w-tot 65 (0.0/s), defrag-q 0 defrag-tot 0 (0.0/s) defrag-w-tot 0 (0.0/s)
May 16 2016 11:57:33 GMT: INFO (drv_ssd): (drv_ssd.c:1094) /opt/aerospike/Namespace__Storage.dat init wblock free-q 99, defrag-q 0
May 16 2016 11:57:53 GMT: INFO (drv_ssd): (drv_ssd.c:2086) device /opt/aerospike/Namespace__Storage.dat: used 0, contig-free 31M (31 wblocks), swb-free 0, w-q 0 w-tot 66 (0.0/s), defrag-q 0 defrag-tot 0 (0.0/s) defrag-w-tot 0 (0.0/s)
May 16 2016 11:58:13 GMT: INFO (drv_ssd): (drv_ssd.c:2086) device /opt/aerospike/Namespace__Storage.dat: used 0, contig-free 30M (30 wblocks), swb-free 0, w-q 0 w-tot 67 (0.0/s), defrag-q 0 defrag-tot 0 (0.0/s) defrag-w-tot 0 (0.0/s)

Explanation

In this situation the write-block-size has not been set and so has taken the default value of 1Mb. The default value for the post-write-queue is 256 blocks so, in this case, that equates to 256Mb which is bigger than the file used for the namespace. Blocks in the post-write-queue are not eligible for defragmentation. As such the blocks are sitting in the post-write-queue which can exceed the size of the namespace (which is 100Mb).

Solution

The solution is to reduce the size of the post-write-queue so that it is no bigger than the size of the namespace. The parameter is dynamic and so can be set without a node restart and also configurable at a namespace level (it is part of the storage-engine sub-stanza). The command to alter the size of the post-write-queue is:

asinfo -v 'set-config:context=namespace;id=<NAMESPACE>;post-write-queue=8'

Notes

  • Definition of write-block-size

http://www.aerospike.com/docs/reference/configuration#write-block-size

  • Definition of post-write-queue

http://www.aerospike.com/docs/reference/configuration#post-write-queue

Keywords

AVAIL-PCT DEFRAG SMALL NAMESPACE STOP-WRITES

Timestamp

5/25/16


What is the difference between device_available_pct and device_free_pct