Recovering from Available Percent Zero


#1

What is Available Percent?

Available Percent (device_available_pct) is a critical statistic for the Aerospike Server as well as a key indicator in determining when to scale up/out your service. Available Percent measures the minimum contiguous disk space across all the devices in a namespace ((avail_pct = min(contig_disk for all disks in namespace) )).

What Happens When Available Percent Reaches Zero?

From the applications perspective, the main indication that available percent has reached zero is that writes will consistently fail. To be more precise, application writes will start failing when the device_available_pct crosses the min-avail-pct configured threshold.

How do I Recover?

Check if defrag is keeping up:

You can check the logs and compare defrag rate with write rate on the per device log line:

 INFO (drv_ssd): (drv_ssd.c:2115) {test} /dev/xvdb: used-bytes 1626239616 free-wblocks 28505 write-q 0 write (8203,23.0) defrag-q 0 defrag-read (7981,21.7) defrag-write (1490,3.0) shadow-write-q 0 tomb-raider-read (1615,59.6)

Refer to the log reference manual for details on the different statistics provided in the above log line.

If defrag is not keeping up, the defrag speed and density can be tuned with the following 2 settings: defrag-lwm-pct and defrag-sleep.

Changes can be made dynamically:

$ asinfo -v "set-config:context=namespace;id=<namespace name>;defrag-lwm-pct=50"
$ asinfo -v "set-config:context=namespace;id=<namespace name>;defrag-sleep=500"

The following article provides further insights on the defragmentation mechanism:

Increase evictions of expirable data

It may also be necessary to increase evictions to allow more records to be deleted and allowing more blocks to be eligible for defragmentations. Evictions can be tuned by setting:

evict-tenths-pct (increase)

high-water-disk-pct (decrease)

high-water-memory-pct (decrease)

nsup-delete-sleep (decrease)

nsup-period (decrease)

The following articles detail the eviction mechanism:

Increase capacity by adding new nodes

Adding a new node to increase capacity is straightforward. If your system has run out of free contiguous space on the disks or partitions, adding a new node allows the current node to offload 1/(new cluster size) data. The success odds of this method are inversely proportional to the size of the cluster. You can further improve your chance of success by stopping new writes generated by your application layer.

Stop service on a node and zero fragmented persistent storage

The ‘dd’ method is typically not acceptable if you are running with replication factor one because this method requires the deletion of wblocks from the namespace which would cause data loss for replication factor one. Replication factor two recovers gracefully from the data loss because when the node is brought back up the deleted data will be repopulated back to the node through migrations (rebalancing). This method requires a cold-restart (which will be empty) and is the only method that is guaranteed to free wblocks in one shot. This assumes that other nodes in the cluster are not running out of avail pct and can handle migrations.

DD command can be used to zero the drive

sudo dd if=/dev/zero of=/dev/DEVICE bs =1M

http://www.aerospike.com/docs/operations/plan/ssd/ssd_init.html

Use blkdiscard

sudo blkdiscard /dev/<INSERT DEVICE NAME HERE>

http://man7.org/linux/man-pages/man8/blkdiscard.8.html

Delete persistent storage file

sudo rm <Aerospike persistent storage file>

Cold-Start-Evict-TTL Method

cold-start-evict-ttl tells the system that anything with a TTL below a certain value is to be ignored during coldstart. This is often used to speed up cold-restart when you know your eviction depth is deep. To get your eviction depth run:

$ asinfo -v "hist-dump:ns=<namespace name>;hist=ttl"
value is  <namespace name:ttl=100,51840,0,0,0,0,0,0,0,0,0,0,0, \
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, \
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,920023,10263488, \
20000052,20319938,23861472,20052298,21612051,22163298, \
24370589,34911006,27048399,29558473,27697235,21049529, \
20300346,17539324,17954128,16932493,16265998,20131370, \
15997368,18030184,17260295,16613023,21100184,18003700, \
20814926,19660860,18829521,23601739,17515442,21490671, \
19797821,19861895,24694092,11354573,14945634,14806583, \
17064793,37144797;

The first number in this histogram tells how many buckets this histogram consists of, the second value is the width of each bucket in seconds, and the remaining 100 values are the number of records that fall in various ttl ranges the last of which are records greater than (100 * width). For this particular histogram we can see that each bucket is 51840 seconds, 14.4 hours, and there are 60 zeros between the width and the first populate value. Meaning the current eviction depth is (60 * 51840). We have to increase the eviction depth for this method to work, and the odds of success are proportional to the amount you increase the eviction depth to using cold-start-evict-ttl.

  • Refer to the Cold Start page for details on cold restarts.
  • Refer to the hist-dump reference page for details on the histogram dump command.

Notes

  • In the case where defrag speed is an issue, it can be beneficial to partition an SSD into multiple partitions. You will be losing a marginal amount of storage, but gaining defrag threads. Both the physical SSD and its partitions are ‘devices’. There is one defrag thread per-device.

  • In the case of storage filled with non-expirable objects (TTL=0), the defrag and evictions solutions will not help. In such a case we would recommend a capacity review with one of our Solutions Architect and a possible increase in capacity by adding additional storage or nodes.