Cluster vs Node Capacity Planning

linux

#1

Hi, I’ve been reading the docs about Linux Capacity Planning. I’m planning on deploying a cluster with a single namespace with a Memory with HDD Persistence storage engine. From what I read in the docs, it’s not clear to me whether the algorithm to calculate the amount of RAM that I will need (including data, primary and secondary indexes) should be divided by the number of instances in the cluster or it represents the amount of RAM of each of the instances in the cluster. As some of the calculations include the replication factor I’m leaning towards the first option (total RAM in cluster), but I’m not entirely sure. Can anybody confirm this?

Thanks in advance, Damián.


#2

Hi Damián-

The section on Memory Required in the capacity planning document (http://www.aerospike.com/docs/operations/plan/capacity/) describes the total amount of memory used for the entire data set in the namespace. You should ensure that the memory available is sufficient to hold 1/ records.

I hope this helps,

-DM


#3

Hi Dave, thanks for the answer.

I still don’t see if the total amount of memory required for the entire data set should fit in every instance of the cluster or if it should fit in the sum of the RAM in the cluster. Let’s put an example:

I have 46 million records, each of them with a size of 130 bytes and a replication factor of two. That gives me around 46m x 130 x 2 =~ 12 GB. Then I have the primary index size = 64 x 2 x 46m =~ 6 GB, and two secondary indexes. One is a String index and the other one an Integer index. Using the worst case memory usage I’ve arrived at the conclusion that the size of each one is approximately 2.5 GB.

So, the total size, including data, primary and secondary indexes is:

12 GB + 6 GB + 2.5 GB x 2 = 23 GB

Should I provision, let’s say, a 3 instance cluster with 32 GB of RAM on each node? (the 23 GB fits in each node’s RAM) Or would it be enough to provision a 3 instance cluster of 12 GB on each node? (the 23 GB fits in the entire cluster)

Thanks in advance, Damián.


#4

Hi Damián-

I apologize for the delay in my response.

Each node must be able to hold a portion of the required RAM for the data in the cluster. The formula for that portion is 1/. If you have three nodes, each node should be able to hold 1/3 of the required RAM.

Let’s say that the total quantity of RAM required is 24 GB (You said 23, but I rounded up to 24).

Each node in the cluster needs to hold 1/ of that total. 1/3 of the 24G is 8G. Each node must be able to hold 8GB in RAM.

If each node contains 24G of RAM, and Aerospike only requires 1/3 of that quantity, it won’t hurt a thing.

I hope that this helps, and I am sorry if I was not clear enough in my last answer.

Thank you for your time,

-DM


#5

That cleared things out Dave.

Thanks!