Size of the primary index

mlabour · May 5, 2015, 9:02pm

The following document

says

Primary Index

Calculated via:

64 bytes × (replication factor) × (number of records)

Please can you shed some light on why there is need 64 bytes per record? Is there flexibility to allow for shorter hashes?

Thank you for your help

raj · May 6, 2015, 9:36am

Primary index entry stores following information

Digest
void_time
generation
Tree related metadata
Pointer to data in memory if data is in memory
Location of data on storage disk.

All this add up to 64bytes. Also it is at 64bytes because of cacheline size which makes life much more tractable in terms of cache misses.

No there is no flexibility right now.

– R

raj · May 7, 2015, 2:29am

mlabour,

I am assuming that need is to reduce the memory footprint of the primary.

If not, could you please elaborate on other benefit you are looking at??

– R

mlabour · May 7, 2015, 4:00pm

Yes you are correct.

As we are sizing the cluster, we are looking at the requirements in RAM for the index.

Question: If we scale out by adding machines, does the size of the index per machine decrease? On other words, If I have one machine with an index of 256 GB, then does adding another machine decrease the size of the index by 2?

Thank you for your help

kporter · May 7, 2015, 6:16pm

If you are using replication factor 1 then going from 1 node to 2 nodes would require half the space per node. However, the primary index slabs are never freed, they are reused but if you add a second node and do not expect an increase in the total number of records then the original node’s primary index will occupy twice the required RAM–this can be reclaimed with a coldrestart of the daemon.

For replication factor 2: If you go from a single node cluster to 2 nodes then both nodes will need the same amount of index space each that the original node had as a single node cluster. The reduce the required amount of RAM for the index by a factor of 2 in a replication factor 2 environment you would need 4 nodes.

Topic		Replies	Views
Capacity Planning - indexes Planning secondary , primary , index	7	2804	May 4, 2015
Memory to Disk Ratio with "data-in-memory false" Operations	3	678	October 28, 2023
Primary Index	7	3083	February 6, 2018
If I only use memory to store all data, need I keep each record less than write-block-size? How Aerospike Works	10	2742	May 24, 2017
Cluster vs Node Capacity Planning Planning linux	4	2814	July 1, 2015

Size of the primary index

Related topics