Where client will grab\read data in case of replication factor equal of number of nodes


#1

Hi.

I have 5 nodes of AE in cluster with factor replication 5 To each node of AE connect app which is make many reads and no writes. New data come only through one node of AE, lets sey that is first node (10.0.1.10)

schema:

server 1 (10.0.1.10)                   server x (10.0.1.15)
--------------------------                   -----------------------------
app1  read> AE1 <|   <   write   | <data_producer-app |
--------------------------                   -----------------------------

server 2 (10.0.1.11)
--------------------------
app2  read> AE2   |
--------------------------

server 3 (10.0.1.12)
--------------------------
app3  read> AE3    |
--------------------------

server 4 (10.0.1.13)
--------------------------
app4  read> AE4   |
--------------------------

server 5 (10.0.1.14)
--------------------------
app5  read> AE5   |
--------------------------

Questions:

  • How will be spread master partition across cluster? (in case if we write only in to the one node of AE) I’m right that it will be 4096 master partitions, which is will divided by 5, number of nodes, and each node will be have ~819 master partitions?
  • In case if i’m right for first question, then each nodes of cluster except “master node” , because of replication factor 5(master + slave + slave + slave + slave), will be have ~819 slave partitions?

for example:

first node 10.0.1.10
-----------------------------------------------
0-819 master partition 
0-819 slave partition from node 2 
0-819 slave partition from node 3
0-819 slave partition from node 4
0-819 slave partition from node 5
----------------------------------------------

Second node 10.0.1.11
-----------------------------------------------
819-1638 master partition 
819-1638 slave partition from node 1
819-1638 slave partition from node 3
819-1638 slave partition from node 4
819-1638 slave partition from node 5
----------------------------------------------
  • So main question where client will find data? or on which node client will grab data - will read always from one node OR will read data always from different node, according to where master partition is stored i’m right that when client from first sever(client configure to look into aerospike on first server ) need to find data from 3010 partition, will get from first node of AE “map” of how data stored, and will make request to the 4th node?
  • And client always goes to master partition?

If something doesn’t readable please write i’ll try to add more info.


#2

You can’t choose to write to 1 node in a cluster, all writes will be sharded across all 5 nodes. If you wish to optimize reads, and do not have consistency requirements, you can specify in your Policy to read from replica. By default the client will go to the “MASTER” node for the data, which will be node 1 1/5th of the time, node 2 1/5th of the time, and so forth… Yes, each node should be the master for around 819 partitions. Each partition contains the list of slave nodes, so every node will still have 819 partitions - except these partitions will indicate there are 4 slaves.


#3

Thank you for replay Can you write where i can read about read policy?


#4

note sure what language you’re using, but it’s not hard to find http://bfy.tw/FQgw


#5

Thank you.


#6

@Albert … how did you do that link? :slight_smile:


#7

BTW, for what you are trying to do, set replication factor to 128. It will default to number of nodes. This way, when you add new nodes, your implementation still works and you do not have to stop and restart the cluster. Replication factor is a static and unanimous parameter.


#8

To be completely clear, having a replication factor higher than 2 has to do with availability - how many nodes can go down without any of the data becoming unavailable.

There is a non-trivial space cost to having a very high replication factor. If your replication factor is N, each node needs to be able to contain all the data of the entire cluster. Also, each write will have to be sent from the node that has the master partition for that record to ALL the other nodes. That is a huge, unnecessary write load.

If you think there’s a read advantage to replicating the data, be aware that you’ll have to change the replica read policy from master to any. Otherwise this costs you a lot in space, network, and write IOPS, and gains you nothing.


#9

He is trying to use Aerospike as a bank of lookup table servers, each holding entire data, with ability to update all with a single write. Use case is READ only , occasional updates to the lookup table. RF=128 or RF=ALL (I think ALL is also valid - ?) and READ REPLICA policy should be RANDOM.

public static final [Replica](https://www.aerospike.com/apidocs/java/com/aerospike/client/policy/Replica.html) RANDOM

Distribute reads across all nodes in cluster in round-robin fashion. Writes always use node containing key’s master partition. This option is useful when the replication factor equals the number of nodes in the cluster and the overhead of requesting proles is not desired.