Setting Aerospike in memory configuration


#1

I have read other similar topics prior to this post, I’m still trying to make myself clearer on how to provision memory in data cluster considering Replica & especially Quorum. Below is my approach.

Should the calculation for Aerospike be like this? Let me know if this is the way or something I have missed(completely).

  • Each node the data+index that application uses will be 18GB MAX ( This will be our LWM for the Namespace in AeroSpike).

    18GB*5nodes = 90/2replica = 45GB(Actual Data+Index) in cluster without replication.

  • Now for atleast 1Replica . That will be 45GB+45GB = 90 GB in Total including Primary+Secondary (Data&Index).

  • I will allocate LWM as 18GB Per node and HWM will be around 28GB for each node. So NS per node I will set it to 30GB. Leaving 2GB for OS or later for NS.

NAMESPACE_NAME=30GB for each node.

Incase of any 2 nodes failure/seperated we can expect the data to get evicted until aerospike hits LWM.

  • 3Node * 18GB will be LWM. (App expected data to be available per node)

So if we reframe the requirement -> Lets say you need a cluster that will use 45GB Primary only( for data + index). 18GB Per Node with one replica that can support in the event of 2 Node failure. Above configuration should help. I see few clusters in aerospike are provisioned without Quorum into consideration. Is this approach is apt ?


#2

You are basically on the right track. There is no “LWM” term … Aerospike has HWM for eviction at 60% of namespace memory used (high-water-memory-pct), stop writes at 90% of namespace memory used (stop-writes-pct).

https://www.aerospike.com/docs/operations/plan/capacity explains memory needed for primary index and data.

Cluster ram required for Primary index = number of records x replication factor x 64 bytes
Cluster ram required for data = n x r x (.... per capacity planning page ....)
Total RAM needed, say = 100GB.
Memory HWM at 60% default:  100/0.6  GB  = 167GB

If 3 node minimum cluster, each node needs 167/3 ~= 55GB available for this namespace. Each server should have 55GB + ram for OS + ram for other processes available. Aerospike will allocate memory to namespace in 1GB chunks (Enterprise Edition) till max defined for the namespace. You have to make sure server has adequate memory that you defined in the namespace.

To account for node failure pattern, say up to two nodes, set up a 5 instead of 3 node cluster with each node spec as above for a 3 node cluster. Note, if you lost two nodes simultaneously, with replication factor 2, you will lose some data partitions.


#3

Thanks pgupta.

1.If there is no concept of LWM then I will have HWM to be 60% that sums up 18GB if I keep Namespace in 30GB RAM for a 32 GB RAM server.

  1. I was mentioning the above case in 5node and not 3 node cluster. If the confusion is on the line "3Node * 18GB will be LWM. (App expected data to be available per node)" -> I was referring to 3node as per quorum of 5node is atleast 3node should be up and how much would be the memory hold up for a 3node max capcaity.

#4

I wanted to point out that there are no ‘quorum reads’ in Aerospike. The client will talk to the node holding the master partition for the specified key for both reads and writes. You can overwrite this behavior by changing the read replica policy.

Writes will always happen against the node holding the master partition for the key. You can change the write commit level policy to choose whether to wait for all the replica writes to complete, or just for the master write. Again, the master will still trigger replica writes to those nodes with the replica partitions, regardless.

So now to how you should set your memory-size for the namespace. Piyush already pointed you in the direction of the capacity planning article, which you should read closely.

Every object, regardless if the namespace stores in memory or on SSD, costs 64B of DRAM for the metadata entry in the primary index. If your namespace keeps its data in memory you will also need to account for how much DRAM that will cost you. 1 billion objects of 1K cost 64G across all the nodes, for example, so 64 * 2 (replication factor) / 5 = 25G of DRAM for the primary index per-node.

The high watermark for memory is associated with evictions. Read the knowledge base FAQ What are Expiration, Eviction and Stop-Writes? Evictions won’t happen at all if your records are set to never expire (TTL -1 in the client). Evictions also don’t happen if you define the sets in your namespace to not evict (set-disable-eviction). See Namespace Retention Configuration.

If you do want to use TTLs for your records, make sure that you don’t hit stop writes when a node goes down. Assume that stop-writes is still set to the default value of 90%. In a 5 node cluster, you’ll need to set your high-water-memory-pct to lower than 90% * 4 / 5 = 72% . You probably want to use 80% max after you have a node go down, so 80 * 4 / 5 = 64 or a value between those two points.