How do I reconcile disk used to actual file size on disk

edd · August 16, 2021, 2:46pm

Hello,

I have a 10 node cluster

du -h /opt/aerospike/data

2.4G - 2.5G on all nodes

asadm, info namespace

each node on average disk used is 165M

the sum row for disk used is 1.610 GB

Relevant config entries

namespace users {
  replication-factor 2
  memory-size 48G


  storage-engine device {
    file /opt/aerospike/data-1.dat
    file /opt/aerospike/data-2.dat
    file /opt/aerospike/data-3.dat
    file /opt/aerospike/data-4.dat
    filesize 150G
    data-in-memory false
  }
}

1.6G reported by asadm x 10 nodes = 16G

2.5G reported by du x 10 nodes = 25G

and my assumption is that Aerospike takes 9G for metadata

is that right?

Thank you

pgupta · August 16, 2021, 4:53pm

When you first start the server, the storage device/file gets 8M header. Lets assume default write-block-size is 1M (max can be 8M) - so at least 1 potential block worth of space is reserved for disk/file header which stores bunch of info used during warm restart. Now, for 150G filesize - lets assume for simplicity you filesize was 100M, write-block is 1M. So you have 100 “pristine” blocks - never used - of those 8 will be used by the file header. So you are left with 92 pristine blocks. With a fresh start, if you ran your du commands, you will see exactly 8Mx #files as space used. Thereafter, as you write and update data, new blocks are filled and if you update, the defrag system will re-use initial blocks as much as it can - while slowiy your remaining prisitine blocks will decrease. The du command cannot differentiate between whole used blocks and partially used blocks. asadm can. Hence the difference. du is showing you (max blocks minus pristine blocks) as used. asadm is showing you just the sum of currently used actual data bytes. Rest of the space is to be reclaimed by defrag. Since defrag lwm is set at 50%, (recommended), two blocks less than 50% used, are combined into a new block and those two are marked free. This defrag keeps going on in the initial set of blocks being used and reused and over time, as as your data grows, more pristine blocks are pulled in. However, in du 's world once a block is no longer pristine - it will be counted as used. Hence the difference.

edd · August 17, 2021, 2:25pm

pgupta:

are filled and if you update, the defrag system will re-use initial blocks as much as it can - while slowiy your remaining prisitine blocks will decrease. The du command cannot differentiate between whole used blocks and partially used blocks. asadm can. Hence the difference. du is showing you (max blocks minus pristine blocks) as used. asadm is showing you just the sum of currently used actual data bytes. Rest of the space is to be reclaimed by defrag. Since defrag lwm is set at 50%, (recommended), two blocks less than 50% used, are combined into a new block and those two are marked free. This defrag keeps going on in the initial set of blocks being used and reused and over time, as as your data grows, more pristine blocks are pulled in. However, in du 's world once a block is no longer pristine - it will be counted as used. Hence the difference.

Thank you, I have to read some more on the defrag process and its settings but this is helpful.

system · November 9, 2021, 2:26pm

This topic was automatically closed 84 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Incorrect disk free+used result in summary stats Operations	2	1960	September 24, 2017
Adding a new node to the cluster does not lower the hard disk usage percentage How Aerospike Works	6	2558	November 4, 2015
Why Aerospike use only 100gb of disk Tuning	19	175	April 5, 2024
Duplicate records	21	2839	March 19, 2020
Unused data on the disk - Error Code 8: Server memory error Configuration	11	8162	June 29, 2017

How do I reconcile disk used to actual file size on disk

Related topics