The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.
FAQ How to determine storage per set.
Note - As of version 5.2, simply leverage the device_data_bytes
set metric.
Note - This article is applicable for server versions 4.2.0.2 and above. For calculating disk storage for a set on versions prior to that, refer to the hist-dump
info command. The important difference between this and the new version is the number of buckets (always 100 previously) and the granularity (previously configured by obj-size-hist-max).
Context
To determine the approximate storage size for a set on a cluster you will need to compute the values returned by the object-size-linear
histogram.
Storage on Disk - Method for persisted data
Here is the notation to compute the approximate set size (over estimation):
1024
Σ num_records_in_bucket_n * (bucket_width * n)
n=1
And for the under estimation:
1024
Σ num_records_in_bucket_n * (bucket_width * (n-1))
n=1
- There are always 1024 buckets.
- The
bucket_width
is computed as thehist-width
divided by 1024 (number of buckets). - The
hist-width
is equal to the configuredwrite-block-size
. - In the special but typical case of a 1MiB
write-block-size
, the bucket_width is1,048,576 / 1,024 = 1,024
. - For a 1MiB
write-block-size
, records in the first bucket are of size between 0 and 1024, in the second bucket, of size between 1025 and 2048, etc…
Perform the calculation for approximating the amount of storage per set on a cluster
The following example provides the steps for calculating the storage per set on a cluster, when using the default data-in-memory
false.
Note: This calculation accounts for both master and replica records. Therefore, there is no need to account for the replication-factor.
Step 1: Generate the object-size-linear histogram
On a single node issue the following info command:
asinfo -v "histogram:namespace=<namespaceName>;type=object-size-linear;set=<setName>;"
Sample Output:
$ asinfo -v "histogram:namespace=test;type=object-size-linear;set=demo;"
units=bytes:hist-width=1048576:bucket-width=1024:buckets=281537970,56726976,21515544,11172775,6825716,455
3921,3216092,2351975,1770540,1355049,1054638,830128,660217,530114,429404,350591,288626,237583,199826,1667
39,140463,118291,100674,85691,73716,63303,54493,47219,40791,35587,30958,27093,23996,20805,18376,16452,145
41,12866,11324,10238,9123,8088,7312,6578,6026,5260,4826,4554,4093,3613,3275,3150,2699,2603,2316,2160,1909
,1801,1618,1543,1436,1253,1191,1096,1016,972,901,826,706,704,640,616,535,509,458,467,425,379,369,313,321,
303,260,230,223,198,213,171,197,173,154,154,147,127,109,122,89,104,112,84,82,96,69,72,67,64,46,64,54,53,4
5,43,63,41,38,33,35,38,20,26,31,18,31,23,22,17,18,30,16,22,19,22,9,18,7,10,16,10,10,4,11,16,12,8,10,7,6,1
1,7,6,7,2,11,8,8,6,7,4,3,3,4,3,5,2,2,4,8,4,3,3,7,2,2,1,1,3,1,1,2,3,1,0,3,2,4,2,1,2,2,1,5,0,1,0,0,2,0,1,0,
0,0,2,1,3,0,0,0,0,2,1,0,2,2,0,1,2,0,0,0,1,1,0,0,0,1,2,1,0,0,0,0,0,2,0,1,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1
,0,0,0,1,3,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Step 2: Compute the approximate set size (over estimate)
Using the output from the object-size-linear histogram compute the approximate set size.
n goes from 1 to bucket_width (e.g. 1024) which is:
"num_records_in_bucket_1"(281537970) * (1024*1)
+ "num_records_in_bucket_2"(56726976) * (1024*2)
+ "num_records_in_bucket_3"(21515544) * (1024*3)
...
The following is a breakdown of the calculation:
281537970*(1024*1)+56726976*(1024*2)+21515544*(1024*3)+11172775*(1024*4)+6825716*(1024*5)+4553921*(1024*6)
+3216092*(1024*7)+2351975*(1024*8)+1770540*(1024*9)+1355049*(1024*10)+1054638*(1024*11)+830128*(1024*12)+
660217*(1024*13)+530114*(1024*14)+429404*(1024*15)+350591*(1024*16)+288626*(1024*17)+237583*(1024*18)+199
826*(1024*19)+166739*(1024*20)+140463*(1024*21)+118291*(1024*22)+100674*(1024*23)+85691*(1024*24)+73716*
(1024*25)+63303*(1024*26)+54493*(1024*27)+47219*(1024*28)+40791*(1024*29)+35587*(1024*30)+30958*(1024*31)
+27093*(1024*32)+23996*(1024*33)+20805*(1024*34)+18376*(1024*35)+16452*(1024*36)+14541*(1024*37)+12866*
(1024*38)+11324*(1024*39)+10238*(1024*40)+9123*(1024*41)+8088*(1024*42)+7312*(1024*43)+6578*(1024*44)+6026
*(1024*45)+5260*(1024*46)+4826*(1024*47)+4554*(1024*48)+4093*(1024*49)+3613*(1024*50)+3275*(1024*51)+3150
*(1024*52)+2699*(1024*53)+2603*(1024*54)+2316*(1024*55)+2160*(1024*56)+1909*(1024*57)+1801*(1024*58)+1618
*(1024*59)+1543*(1024*60)+1436*(1024*61)+1253*(1024*62)+1191*(1024*63)+1096*(1024*64)+1016*(1024*65)+972*
(1024*66)+901*(1024*67)+826*(1024*68)+706*(1024*69)+704*(1024*70)+640*(1024*71)+616*(1024*72)+535*(1024*73)
+509*(1024*74)+458*(1024*75)+467*(1024*76)+425*(1024*77)+379*(1024*78)+369*(1024*79)+313*(1024*80)+321*
(1024*81)+303*(1024*82)+260*(1024*83)+230*(1024*84)+223*(1024*85)+198*(1024*86)+213*(1024*87)+171*(1024*88)
+197*(1024*89)+173*(1024*90)+154*(1024*91)+154*(1024*92)+147*(1024*93)+127*(1024*94)+109*(1024*95)+122*
(1024*96)+89*(1024*97)+104*(1024*98)+112*(1024*99)+84*(1024*100)+82*(1024*101)+96*(1024*102)+69*(1024*103)
+72*(1024*104)+67*(1024*105)+64*(1024*106)+46*(1024*107)+64*(1024*108)+54*(1024*109)+53*(1024*110)+45*
(1024*111)+43*(1024*112)+63*(1024*113)+41*(1024*114)+38*(1024*115)+33*(1024*116)+35*(1024*117)+38*(1024*118)
+20*(1024*119)+26*(1024*120)+31*(1024*121)+18*(1024*122)+31*(1024*123)+23*(1024*124)+22*(1024*125)+17*
(1024*126)+18*(1024*127)+30*(1024*128)+16*(1024*129)+22*(1024*130)+19*(1024*131)+22*(1024*132)+9*(1024*133)
+18*(1024*134)+7*(1024*135)+10*(1024*136)+16*(1024*137)+10*(1024*138)+10*(1024*139)+4*(1024*140)+11*
(1024*141)+16*(1024*142)+12*(1024*143)+8*(1024*144)+10*(1024*145)+7*(1024*146)+6*(1024*147)+11*(1024*148)
+7*(1024*149)+6*(1024*150)+7*(1024*151)+2*(1024*152)+11*(1024*153)+8*(1024*154)+8*(1024*155)+6*(1024*156)+
7*(1024*157)+4*(1024*158)+3*(1024*159)+3*(1024*160)+4*(1024*161)+3*(1024*162)+5*(1024*163)+2*(1024*164)+2*
(1024*165)+4*(1024*166)+8*(1024*167)+4*(1024*168)+3*(1024*169)+3*(1024*170)+7*(1024*171)+2*(1024*172)+2*
(1024*173)+1*(1024*174)+1*(1024*175)+3*(1024*176)+1*(1024*177)+1*(1024*178)+2*(1024*179)+3*(1024*180)+1*
(1024*181)+0*(1024*182)+3*(1024*183)+2*(1024*184)+4*(1024*185)+2*(1024*186)+1*(1024*187)+2*(1024*188)+2*
(1024*189)+1*(1024*190)+5*(1024*191)+0*(1024*192)+1*(1024*193)+0*(1024*194)+0*(1024*195)+2*(1024*196)+0*
(1024*197)+1*(1024*198)+0*(1024*199)+0*(1024*200)+0*(1024*201)+2*(1024*202)+1*(1024*203)+3*(1024*204)+0*
(1024*205)+0*(1024*206)+0*(1024*207)+0*(1024*208)+2*(1024*209)+1*(1024*210)+0*(1024*211)+2*(1024*212)+2*
(1024*213)+0*(1024*214)+1*(1024*215)+2*(1024*216)+0*(1024*217)+0*(1024*218)+0*(1024*219)+1*(1024*220)+1*
(1024*221)+0*(1024*222)+0*(1024*223)+0*(1024*224)+1*(1024*225)+2*(1024*226)+1*(1024*227)+0*(1024*228)+0*
(1024*229)+0*(1024*230)+0*(1024*231)+0*(1024*232)+2*(1024*233)+0*(1024*234)+1*(1024*235)+0*(1024*236)+1*
(1024*237)+0*(1024*238)+0*(1024*239)+0*(1024*240)+0*(1024*241)+1*(1024*242)+0*(1024*243)+0*(1024*244)+0*
(1024*245)+1*(1024*246)+0*(1024*247)+0*(1024*248)+0*(1024*249)+0*(1024*250)+0*(1024*251)+1*(1024*252)+0*
(1024*253)+0*(1024*254)+0*(1024*255)+1*(1024*256)+3*(1024*257)+0*(1024*258)+0*(1024*259)+0*(1024*260)+0*
(1024*261)+0*(1024*262)+0*(1024*263)+0*(1024*264)+0*(1024*265)+0*(1024*266)+0*(1024*267)+0*(1024*268)+0*
(1024*269)+0*(1024*270)+1*(1024*271)+1*(1024*272)+0*(1024*273)+0*(1024*274)+0*(1024*275)+0*(1024*276)+0*
(1024*277)+1*(1024*278)
...
...
...
+0*(1024*1018)+0*(1024*1019)+0*(1024*1020)+0*(1024*1021)+0*(1024*1022)+0*(1024*1023)+0*(1024*1024)
Total: 750,354,164,736 bytes ~= 698.82 GiB
Step 3: Compute the approximate set size for the cluster
Repeat step 1 & step 2 for each node in the cluster, then add each individual node totals to calculate the storage per set for the whole cluster.
Step 4: Under estimation of the set size
Repeat the process but shifting the bucket sizes by 1 (first line would always be 0 of course):
"num_records_in_bucket_1"(281537970) * (1024*0)
+ "num_records_in_bucket_2"(56726976) * (1024*1)
+ "num_records_in_bucket_3"(21515544) * (1024*2)
...
Storage in Memory - Method for data-in-memory true
Memory used by a set can be calculated by summing up the following 3 areas of usage:
- Primary Index - each index entry uses 64 bytes in memory as defined in our Capacity Planning document.
- Data Usage - memory used for storing the record values (since we are here configured with data-in-memory true or storage-engine memory) - This would depend on the size of the records in the set.
- Secondary Index - refer to the Secondary index Capacity planning documentation as well as the memory_used_sindex_bytes statistic.
- Note: this statistic is for the entire namespace, thus we would need to calculate further on how many records of the particular set would be part of the secondary index memory footprint. The following example assumes no secondary index.
Perform the calculation for approximating the amount of storage per set on a cluster
The following example provides the steps for calculating the storage per set on a cluster, when specifying data-in-memory
true. For data-in-memory
false, calculating the primary index memory footprint and secondary index (if applicable) is sufficient.
Note: This calculation accounts for both master and replica records. Therefore, there is no need to account for the replication-factor.
Step 1: Generate the object count and memory_data_bytes
On a single node, to generate the set information for all nodes in the cluster, issue the following asadm command:
asinfo -v 'sets' -l"
ns=test:set=demo:objects=4478299:tombstones=0:memory_data_bytes=0:truncate_lut=274395528189:stop-writes-count=0:set-enable-xdr=use-default:disable-eviction=false
ns=bar:set=testset:objects=99823:tombstones=0:memory_data_bytes=1397522:truncate_lut=0:stop-writes-count=0:set-enable-xdr=use-default:disable-eviction=false
To get the set statistics from all nodes in the cluster, issue the following asadm command:
asadm -e "asinfo -v 'sets' -l"
Note: The demo
set of the test
namespace value memory_data_bytes=0
indicates data-in-memory false.
The number of objects can be found in the objects
value.
In the following sample output, the set testset
of the namespace bar
on node1, the number of objects is 99823 and the number of bytes of memory for the data is 1397522 bytes.
ns=bar:set=testset:objects=99823:tombstones=0:memory_data_bytes=1397522:truncate_lut=0:stop-writes-count=0:set-enable-xdr=use-default:disable-eviction=false
The amount of bytes of memory used to store the data can be found in the memory_data_bytes
value.
Step 2: Compute the set size for the cluster
Adding the Primary Index total usage (number of objects multiplied by 64 bytes) and the memory_data_bytes
usage returns the amount of Memory used for the set.
Total: ((Primary Index) + memory_data_bytes) = Memory Used
Total: ((99823*64 ) + 1397522 ) = 7,786,194 Bytes
Repeat step 1 & step 2 for each node in the cluster, then add each individual node totals to calculate the storage per set for the whole cluster. When on a 2 node cluster with replication factor 2, the numbers will match since each node will hold all the data (master and replica).
Step 3: Percentage used by a set
For calculating the percentages used by a set against the current total used for a namespace and the total allocated memory for a namespace, the following values are required:
- Total namespace allocated memory:
To retrieve the configured memory-size
, issue the following asadm command:
asadm -e "asinfo -v 'namespace/<namespaceName>' like memory-size"
Sample Output (for a 10G configured namespace):
$ asadm -e "asinfo -v 'namespace/bar' like memory-size"
node1.aerospike.com:3000 (172.17.0.1) returned:
memory-size=10737418240
node2.aerospike.com:3000 (172.17.0.2) returned:
memory-size=10737418240
In this sample output, for the namespace bar, the allocated memory-size is 10737418240 bytes (10 GiB).
- Total namespace used memory:
To retrieve the total used memory memory_used_bytes
, issue the following asadm command:
asadm -e "show statistics for bar like memory_used_bytes"
~~~~~~~bar Namespace Statistics (2019-06-11 23:25:27 UTC)~~~~~~~
NODE : 172.17.0.1:3000 172.17.0.2:3000
memory_used_bytes: 1572543873 1572543873
In this sample output, for the namespace bar, the used memory is 1572543873 bytes (~1.46GiB)
Keywords
STORAGE SIZE SET HISTOGRAM CLUSTER MEMORY PERCENT
Timestamp
June 5 2019