How to control data distribution to restricted number of nodes


#1

I have three Aerospike nodes. On all nodes have same name space is configured. One node has replication factor 1. Remaining nodes have replication factor 2.

My expectation was data inserted on node one(with replication factor 1) should not be replicated on other two nodes. But it is getting replicated on other two nodes. Also data inserted on node 2 & 3 is replicated on node 1.

What configuration is needed so that one node should not replicate data on other nodes. It has to be out of cluster?? Also which configuration parameter decide which nodes form a cluster??

Thanks,

Mangesh Sawant.


#2

Hi Mangesh,

Aerospike suggest same replication factor for a namespace across the cluster. The auto-sharding and auto-balancing of the data doesn’t allow client application to control which data is stored on which node, hence your expectation/setting of having specific data on a single node with no replication won’t be possible.

If you have specific data that you wouldn’t want to replicate, probably you could have another namespace with different replication factor, but within same namespace, this wouldn’t be possible.

Options available: Namespace foo : replication factor 1 — store data that should not be replicated Namespace bar : replication factor 2 — store data that should be replicated

let me know if you need further help.

-samir


#3

Replication factor refers to the number of copies of data within that namespace.

RF of 2 means 2 copies, so 100 records saved will end up as 200 total records (2 copies of each record). RF 3 will mean 300 total records after saving 100 to the database.

The data itself is automatically spread out within a cluster to balance data usage and IO. There is no control over this. However because the replication factor is a setting for the entire namespace (not a single node), it should match on all nodes in the cluster. You might run into issues with different RF settings on each node.

Clusters are just Aerospike nodes that have the same namespace configurations. They discover each other based on the seed-address setting in the config, if you take that out, a node will not know which other server to contact when starting and will not join a cluster (unless another server tries to contact it).