Low avail_pct - is it dangerous?


#1

Hi,

We have a 10 nodes cluster. Actually this cluster was extended by 2 nodes recently.

Disk used is about 75% on all the nodes. But avail_pct is not the same on all the nodes and is not (100% - used).

it is about 25% on the new nodes, about 10% on the old nodes and about 6% on one of them.

The question is: is it ok?

what can I do to recover some space not used nor available?

Thank you.

With best regards, Daniel Podolsky


#3

Not for long. When Available percent dips below 5 percent, the node will enter stop writes.

Since version 3.3.17 of Aerospike (what version are you one?), the defrag tuning has been reduced to two important parameters.

  • defrag-sleep parameter defines how many micro-seconds defrag should sleep between each defragged write-block. By default this is set to 1000 micro-seconds (1ms).
  • defrag-lwm-pct defines the maximum wblock usage a must be before it is placed on the defrag-queue. By default this is 50%, you could raise this up to 60%, but be aware that this setting has a non-linear impact to write load that increases with this value.

#4

@onokonem,

There is an earlier community article that we can refer you to here about available percent (avail_pct).

A possible cause of the higher avail_pct on the new node could be that it does not yet have enough data.

Another cause could be that migrations are occurring, causing the differences between nodes. In steady state (no migrations), avail_pct should have a balanced value across all nodes.


#5

finally we’ve fixed the problem with

for node in $listOfNodes
  do
  asinfo -v "set-config:context=namespace;id=storebig;defrag-lwm-pct=90"  -h $node
  asinfo -v "set-config:context=namespace;id=storebig;defrag-sleep=500" -h $node
  done

defrag-lwm-pct was 99 on all the nodes for unknown reason (it is not specified in config)


#6

:fearful:

Let me stress the non-linear impact, as you approach 100% the write amplification approaches infinity. At 50% there is a 2x write amplification and 75% would be 4x.

I would recommend that you try 50 or 60% instead

Also the following will apply the config change dynamically across the cluster:

asadm -e "asinfo -v 'set-config:context=namespace;id=storebig;defrag-lwm-pct=50'"

But you will still need to update the static config file.