How to determine storage per set

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

FAQ How to determine storage per set.

Note - As of version 5.2, simply leverage the device_data_bytes set metric.

Note - This article is applicable for server versions 4.2.0.2 and above. For calculating disk storage for a set on versions prior to that, refer to the hist-dump info command. The important difference between this and the new version is the number of buckets (always 100 previously) and the granularity (previously configured by obj-size-hist-max).

Context

To determine the approximate storage size for a set on a cluster you will need to compute the values returned by the object-size-linear histogram.

Storage on Disk - Method for persisted data

Here is the notation to compute the approximate set size (over estimation):

       1024
         Σ            num_records_in_bucket_n * (bucket_width * n)
        n=1

And for the under estimation:

       1024
         Σ            num_records_in_bucket_n * (bucket_width * (n-1))
        n=1
  • There are always 1024 buckets.
  • The bucket_width is computed as the hist-width divided by 1024 (number of buckets).
  • The hist-width is equal to the configured write-block-size.
  • In the special but typical case of a 1MiB write-block-size, the bucket_width is 1,048,576 / 1,024 = 1,024.
  • For a 1MiB write-block-size, records in the first bucket are of size between 0 and 1024, in the second bucket, of size between 1025 and 2048, etc…

Perform the calculation for approximating the amount of storage per set on a cluster

The following example provides the steps for calculating the storage per set on a cluster, when using the default data-in-memory false.

Note: This calculation accounts for both master and replica records. Therefore, there is no need to account for the replication-factor.

Step 1: Generate the object-size-linear histogram

On a single node issue the following info command:

asinfo -v "histogram:namespace=<namespaceName>;type=object-size-linear;set=<setName>;"

Sample Output:

$ asinfo -v "histogram:namespace=test;type=object-size-linear;set=demo;"
units=bytes:hist-width=1048576:bucket-width=1024:buckets=281537970,56726976,21515544,11172775,6825716,455
3921,3216092,2351975,1770540,1355049,1054638,830128,660217,530114,429404,350591,288626,237583,199826,1667
39,140463,118291,100674,85691,73716,63303,54493,47219,40791,35587,30958,27093,23996,20805,18376,16452,145
41,12866,11324,10238,9123,8088,7312,6578,6026,5260,4826,4554,4093,3613,3275,3150,2699,2603,2316,2160,1909
,1801,1618,1543,1436,1253,1191,1096,1016,972,901,826,706,704,640,616,535,509,458,467,425,379,369,313,321,
303,260,230,223,198,213,171,197,173,154,154,147,127,109,122,89,104,112,84,82,96,69,72,67,64,46,64,54,53,4
5,43,63,41,38,33,35,38,20,26,31,18,31,23,22,17,18,30,16,22,19,22,9,18,7,10,16,10,10,4,11,16,12,8,10,7,6,1
1,7,6,7,2,11,8,8,6,7,4,3,3,4,3,5,2,2,4,8,4,3,3,7,2,2,1,1,3,1,1,2,3,1,0,3,2,4,2,1,2,2,1,5,0,1,0,0,2,0,1,0,
0,0,2,1,3,0,0,0,0,2,1,0,2,2,0,1,2,0,0,0,1,1,0,0,0,1,2,1,0,0,0,0,0,2,0,1,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1
,0,0,0,1,3,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

Step 2: Compute the approximate set size (over estimate)

Using the output from the object-size-linear histogram compute the approximate set size.

n goes from 1 to bucket_width (e.g. 1024) which is:

  "num_records_in_bucket_1"(281537970) * (1024*1)
+ "num_records_in_bucket_2"(56726976)  * (1024*2) 
+ "num_records_in_bucket_3"(21515544)  * (1024*3)
...

The following is a breakdown of the calculation:

281537970*(1024*1)+56726976*(1024*2)+21515544*(1024*3)+11172775*(1024*4)+6825716*(1024*5)+4553921*(1024*6)
+3216092*(1024*7)+2351975*(1024*8)+1770540*(1024*9)+1355049*(1024*10)+1054638*(1024*11)+830128*(1024*12)+
660217*(1024*13)+530114*(1024*14)+429404*(1024*15)+350591*(1024*16)+288626*(1024*17)+237583*(1024*18)+199
826*(1024*19)+166739*(1024*20)+140463*(1024*21)+118291*(1024*22)+100674*(1024*23)+85691*(1024*24)+73716*
(1024*25)+63303*(1024*26)+54493*(1024*27)+47219*(1024*28)+40791*(1024*29)+35587*(1024*30)+30958*(1024*31)
+27093*(1024*32)+23996*(1024*33)+20805*(1024*34)+18376*(1024*35)+16452*(1024*36)+14541*(1024*37)+12866*
(1024*38)+11324*(1024*39)+10238*(1024*40)+9123*(1024*41)+8088*(1024*42)+7312*(1024*43)+6578*(1024*44)+6026
*(1024*45)+5260*(1024*46)+4826*(1024*47)+4554*(1024*48)+4093*(1024*49)+3613*(1024*50)+3275*(1024*51)+3150
*(1024*52)+2699*(1024*53)+2603*(1024*54)+2316*(1024*55)+2160*(1024*56)+1909*(1024*57)+1801*(1024*58)+1618
*(1024*59)+1543*(1024*60)+1436*(1024*61)+1253*(1024*62)+1191*(1024*63)+1096*(1024*64)+1016*(1024*65)+972*
(1024*66)+901*(1024*67)+826*(1024*68)+706*(1024*69)+704*(1024*70)+640*(1024*71)+616*(1024*72)+535*(1024*73)
+509*(1024*74)+458*(1024*75)+467*(1024*76)+425*(1024*77)+379*(1024*78)+369*(1024*79)+313*(1024*80)+321*
(1024*81)+303*(1024*82)+260*(1024*83)+230*(1024*84)+223*(1024*85)+198*(1024*86)+213*(1024*87)+171*(1024*88)
+197*(1024*89)+173*(1024*90)+154*(1024*91)+154*(1024*92)+147*(1024*93)+127*(1024*94)+109*(1024*95)+122*
(1024*96)+89*(1024*97)+104*(1024*98)+112*(1024*99)+84*(1024*100)+82*(1024*101)+96*(1024*102)+69*(1024*103)
+72*(1024*104)+67*(1024*105)+64*(1024*106)+46*(1024*107)+64*(1024*108)+54*(1024*109)+53*(1024*110)+45*
(1024*111)+43*(1024*112)+63*(1024*113)+41*(1024*114)+38*(1024*115)+33*(1024*116)+35*(1024*117)+38*(1024*118)
+20*(1024*119)+26*(1024*120)+31*(1024*121)+18*(1024*122)+31*(1024*123)+23*(1024*124)+22*(1024*125)+17*
(1024*126)+18*(1024*127)+30*(1024*128)+16*(1024*129)+22*(1024*130)+19*(1024*131)+22*(1024*132)+9*(1024*133)
+18*(1024*134)+7*(1024*135)+10*(1024*136)+16*(1024*137)+10*(1024*138)+10*(1024*139)+4*(1024*140)+11*
(1024*141)+16*(1024*142)+12*(1024*143)+8*(1024*144)+10*(1024*145)+7*(1024*146)+6*(1024*147)+11*(1024*148)
+7*(1024*149)+6*(1024*150)+7*(1024*151)+2*(1024*152)+11*(1024*153)+8*(1024*154)+8*(1024*155)+6*(1024*156)+
7*(1024*157)+4*(1024*158)+3*(1024*159)+3*(1024*160)+4*(1024*161)+3*(1024*162)+5*(1024*163)+2*(1024*164)+2*
(1024*165)+4*(1024*166)+8*(1024*167)+4*(1024*168)+3*(1024*169)+3*(1024*170)+7*(1024*171)+2*(1024*172)+2*
(1024*173)+1*(1024*174)+1*(1024*175)+3*(1024*176)+1*(1024*177)+1*(1024*178)+2*(1024*179)+3*(1024*180)+1*
(1024*181)+0*(1024*182)+3*(1024*183)+2*(1024*184)+4*(1024*185)+2*(1024*186)+1*(1024*187)+2*(1024*188)+2*
(1024*189)+1*(1024*190)+5*(1024*191)+0*(1024*192)+1*(1024*193)+0*(1024*194)+0*(1024*195)+2*(1024*196)+0*
(1024*197)+1*(1024*198)+0*(1024*199)+0*(1024*200)+0*(1024*201)+2*(1024*202)+1*(1024*203)+3*(1024*204)+0*
(1024*205)+0*(1024*206)+0*(1024*207)+0*(1024*208)+2*(1024*209)+1*(1024*210)+0*(1024*211)+2*(1024*212)+2*
(1024*213)+0*(1024*214)+1*(1024*215)+2*(1024*216)+0*(1024*217)+0*(1024*218)+0*(1024*219)+1*(1024*220)+1*
(1024*221)+0*(1024*222)+0*(1024*223)+0*(1024*224)+1*(1024*225)+2*(1024*226)+1*(1024*227)+0*(1024*228)+0*
(1024*229)+0*(1024*230)+0*(1024*231)+0*(1024*232)+2*(1024*233)+0*(1024*234)+1*(1024*235)+0*(1024*236)+1*
(1024*237)+0*(1024*238)+0*(1024*239)+0*(1024*240)+0*(1024*241)+1*(1024*242)+0*(1024*243)+0*(1024*244)+0*
(1024*245)+1*(1024*246)+0*(1024*247)+0*(1024*248)+0*(1024*249)+0*(1024*250)+0*(1024*251)+1*(1024*252)+0*
(1024*253)+0*(1024*254)+0*(1024*255)+1*(1024*256)+3*(1024*257)+0*(1024*258)+0*(1024*259)+0*(1024*260)+0*
(1024*261)+0*(1024*262)+0*(1024*263)+0*(1024*264)+0*(1024*265)+0*(1024*266)+0*(1024*267)+0*(1024*268)+0*
(1024*269)+0*(1024*270)+1*(1024*271)+1*(1024*272)+0*(1024*273)+0*(1024*274)+0*(1024*275)+0*(1024*276)+0*
(1024*277)+1*(1024*278)
...
...
...
+0*(1024*1018)+0*(1024*1019)+0*(1024*1020)+0*(1024*1021)+0*(1024*1022)+0*(1024*1023)+0*(1024*1024)

Total: 750,354,164,736 bytes ~= 698.82 GiB

Step 3: Compute the approximate set size for the cluster

Repeat step 1 & step 2 for each node in the cluster, then add each individual node totals to calculate the storage per set for the whole cluster.

Step 4: Under estimation of the set size

Repeat the process but shifting the bucket sizes by 1 (first line would always be 0 of course):

  "num_records_in_bucket_1"(281537970) * (1024*0)
+ "num_records_in_bucket_2"(56726976)  * (1024*1) 
+ "num_records_in_bucket_3"(21515544)  * (1024*2)
...

Storage in Memory - Method for data-in-memory true

Memory used by a set can be calculated by summing up the following 3 areas of usage:

  • Primary Index - each index entry uses 64 bytes in memory as defined in our Capacity Planning document.
  • Data Usage - memory used for storing the record values (since we are here configured with data-in-memory true or storage-engine memory) - This would depend on the size of the records in the set.
  • Secondary Index - refer to the Secondary index Capacity planning documentation as well as the memory_used_sindex_bytes statistic.
    • Note: this statistic is for the entire namespace, thus we would need to calculate further on how many records of the particular set would be part of the secondary index memory footprint. The following example assumes no secondary index.

Perform the calculation for approximating the amount of storage per set on a cluster

The following example provides the steps for calculating the storage per set on a cluster, when specifying data-in-memory true. For data-in-memory false, calculating the primary index memory footprint and secondary index (if applicable) is sufficient.

Note: This calculation accounts for both master and replica records. Therefore, there is no need to account for the replication-factor.

Step 1: Generate the object count and memory_data_bytes

On a single node, to generate the set information for all nodes in the cluster, issue the following asadm command:

asinfo -v 'sets' -l"
ns=test:set=demo:objects=4478299:tombstones=0:memory_data_bytes=0:truncate_lut=274395528189:stop-writes-count=0:set-enable-xdr=use-default:disable-eviction=false
ns=bar:set=testset:objects=99823:tombstones=0:memory_data_bytes=1397522:truncate_lut=0:stop-writes-count=0:set-enable-xdr=use-default:disable-eviction=false

To get the set statistics from all nodes in the cluster, issue the following asadm command:

asadm -e "asinfo -v 'sets' -l"

Note: The demo set of the test namespace value memory_data_bytes=0 indicates data-in-memory false.

The number of objects can be found in the objects value.

In the following sample output, the set testset of the namespace bar on node1, the number of objects is 99823 and the number of bytes of memory for the data is 1397522 bytes.

ns=bar:set=testset:objects=99823:tombstones=0:memory_data_bytes=1397522:truncate_lut=0:stop-writes-count=0:set-enable-xdr=use-default:disable-eviction=false

The amount of bytes of memory used to store the data can be found in the memory_data_bytes value.

Step 2: Compute the set size for the cluster

Adding the Primary Index total usage (number of objects multiplied by 64 bytes) and the memory_data_bytes usage returns the amount of Memory used for the set.

Total: ((Primary Index) + memory_data_bytes) = Memory Used 
Total: ((99823*64     ) + 1397522          ) = 7,786,194 Bytes

Repeat step 1 & step 2 for each node in the cluster, then add each individual node totals to calculate the storage per set for the whole cluster. When on a 2 node cluster with replication factor 2, the numbers will match since each node will hold all the data (master and replica).

Step 3: Percentage used by a set

For calculating the percentages used by a set against the current total used for a namespace and the total allocated memory for a namespace, the following values are required:

  • Total namespace allocated memory:

To retrieve the configured memory-size, issue the following asadm command:

asadm -e "asinfo -v 'namespace/<namespaceName>' like memory-size"

Sample Output (for a 10G configured namespace):

$ asadm -e "asinfo -v 'namespace/bar' like memory-size"
node1.aerospike.com:3000 (172.17.0.1) returned:
memory-size=10737418240

node2.aerospike.com:3000 (172.17.0.2) returned:
memory-size=10737418240

In this sample output, for the namespace bar, the allocated memory-size is 10737418240 bytes (10 GiB).

  • Total namespace used memory:

To retrieve the total used memory memory_used_bytes, issue the following asadm command:

asadm -e "show statistics for bar like memory_used_bytes"
~~~~~~~bar Namespace Statistics (2019-06-11 23:25:27 UTC)~~~~~~~
NODE             :   172.17.0.1:3000   172.17.0.2:3000
memory_used_bytes:   1572543873        1572543873

In this sample output, for the namespace bar, the used memory is 1572543873 bytes (~1.46GiB)

Keywords

STORAGE SIZE SET HISTOGRAM CLUSTER MEMORY PERCENT

Timestamp

June 5 2019

1 Like