Aerospike Primary in memory Secondray on Disk?


#1

Is is possible to have passive replicas in aerospike i.e I don;t need my replicas for reads but only for fault tolerance. Is there any way to configure it so that my primary data remains in RAM with secondary on disk


#2

Aerospike stores replica and master on different nodes. What you describe regarding fault tolerance is the default behavior - for example a 3 node cluster with replication factor of 2 will give you that. Regarding have data in RAM with persistence on disk - yes you can do that. See: http://www.aerospike.com/docs/operations/configure/namespace/storage


#3

HI Piyush

I’ll explain my query in a more precise manner.

Lets say my total data (indexes + data) is 500G. I want to store my primary data in memory and replicas on disk .which would mean I need 500G RAM + 500G SSD with replication factor 2 but replicas are only for backup purpose i.e. if a node A goes down then all its replicas are on other node on disk and they will come up on other node by reading data from disk and thereafter copying them to other nodes for creating replica again

However in RAM+Persistance disk config to achive replication factor of 2 , I will need 1TB of both RAM and disk as data will be replicated in RAM as well. In this mode disk contains data of the same node only and hence I can’t live with replication factor 1. But with above mentioned config I can achieve that given I can tolerate missing data for some time when a node crashes.

Overall I am trying to achieve fault tolerance with half the resources


#4

Consider a record on a node. It can be either be a Master record on this node (with its Replica copy on another node) or a Replica record on this node (with its Master copy on another node). Regardless of its pedigree, Master or Replica, its a record in “namespace” that you define. In the namespace definition, you decide how a record must be stored on a node. The Primary Index of any record is always stored in RAM. Its the record data + associated metadata that you can choose to store between the following scenarios:
1 - store it in RAM 2 - store it on SSD 3 - store it both in RAM and SSD on the SAME node.

So if your original master data is 500G, you want to store with replication factor of 2 - means you now have to provision for storing 1TB of data, then you want it both on RAM and SSD - you will need 1TB of RAM and 1TB of SSD in the cluster. If you have 10 nodes, each can be 100GB RAM and 100GB SSD. (Actual sizing is a bit more involved. See http://www.aerospike.com/docs/operations/plan/capacity ) You have to over provision roughly 2x - so you don’t run over 50% capacity.

I think what you are trying to do is store a record in RAM on node1 and its persistence copy on node2’s disk. If thats what you are trying to do - you cannot configure the cluster to do that.


#5

Thanks for the quick update

I got that this cannot be done with current features. But wouldn’t it be a good use case to have if aerospike can support this mode ? Or it is not possible in the way currently it is implemented

Regards. Arpit


#6

Is it theoretically possible to add a namespace feature that if the record is Master store in RAM, if it is Replica store on SSD? - may be. The benefit is just hardware cost - downside is degraded performance if a node goes down and cluster has to be rebalanced plus all other corner cases that may arise. As a side note, if you plan to deploy on AWS, there is a feature called shadow drive that backs ephemeral storage on EBS. You can explore it here: http://www.aerospike.com/docs/deploy_guides/aws/recommendations