How do I reconcile disk used to actual file size on disk

Hello,

I have a 10 node cluster

du -h /opt/aerospike/data

2.4G - 2.5G on all nodes


asadm, info namespace

each node on average disk used is 165M

the sum row for disk used is 1.610 GB


Relevant config entries

namespace users {
  replication-factor 2
  memory-size 48G


  storage-engine device {
    file /opt/aerospike/data-1.dat
    file /opt/aerospike/data-2.dat
    file /opt/aerospike/data-3.dat
    file /opt/aerospike/data-4.dat
    filesize 150G
    data-in-memory false
  }
}

1.6G reported by asadm x 10 nodes = 16G

2.5G reported by du x 10 nodes = 25G

and my assumption is that Aerospike takes 9G for metadata

is that right?

Thank you

When you first start the server, the storage device/file gets 8M header. Lets assume default write-block-size is 1M (max can be 8M) - so at least 1 potential block worth of space is reserved for disk/file header which stores bunch of info used during warm restart. Now, for 150G filesize - lets assume for simplicity you filesize was 100M, write-block is 1M. So you have 100 “pristine” blocks - never used - of those 8 will be used by the file header. So you are left with 92 pristine blocks. With a fresh start, if you ran your du commands, you will see exactly 8Mx #files as space used. Thereafter, as you write and update data, new blocks are filled and if you update, the defrag system will re-use initial blocks as much as it can - while slowiy your remaining prisitine blocks will decrease. The du command cannot differentiate between whole used blocks and partially used blocks. asadm can. Hence the difference. du is showing you (max blocks minus pristine blocks) as used. asadm is showing you just the sum of currently used actual data bytes. Rest of the space is to be reclaimed by defrag. Since defrag lwm is set at 50%, (recommended), two blocks less than 50% used, are combined into a new block and those two are marked free. This defrag keeps going on in the initial set of blocks being used and reused and over time, as as your data grows, more pristine blocks are pulled in. However, in du 's world once a block is no longer pristine - it will be counted as used. Hence the difference.

Thank you, I have to read some more on the defrag process and its settings but this is helpful.

This topic was automatically closed 84 days after the last reply. New replies are no longer allowed.