Here is how to do it. I set up a 3 node cluster. Each node has a namespace test. I use replication factor 3 - so each node has full copy of data.
Additionally, namespace test each node has its own rack-id. So node A1 uses rack-id 1, node A2 uses rack-id 2 and node A3 uses rack-id 3.
rack-id is a dynamic parameter, available in Enterprise Edition only. So you can add it on an existing cluster and update the configuration file for future restarts of the node.
Here is my namespace configuration for node id A1:
namespace test {
replication-factor 3
rack-id 1
default-ttl 5d # 5 days, use 0 to never expire/evict.
nsup-period 15
storage-engine device {
file /opt/aerospike/data/test.dat
filesize 5G
}
}
I have added 10 test records as shown below:
Initialized the client and connected to the cluster.
key0 : (gen:1),(exp:453789840),(bins:(name:Sandra),(age:34))
key1 : (gen:1),(exp:453789840),(bins:(name:Jack),(age:26))
key2 : (gen:1),(exp:453789840),(bins:(name:Jill),(age:20))
key3 : (gen:1),(exp:453789840),(bins:(name:James),(age:38))
key4 : (gen:1),(exp:453789840),(bins:(name:Jim),(age:46))
key5 : (gen:1),(exp:453789840),(bins:(name:Julia),(age:62))
key6 : (gen:1),(exp:453789840),(bins:(name:Sally),(age:32))
key7 : (gen:1),(exp:453789840),(bins:(name:Sean),(age:24))
key8 : (gen:1),(exp:453789840),(bins:(name:Sam),(age:12))
key9 : (gen:1),(exp:453789840),(bins:(name:Susan),(age:42))
Quickly check data distribution using asadm -e info
command:
I created a secondary index on the age bin as follows:
asadm --enable -e "manage sindex create numeric idx_age ns test set testset bin age"
For running the query, here is my client code.
//Instantiate client object with Preferred Rack ClientPolicy
//Here, this client is indicating, its preferred rack is with rack-id=1.
ClientPolicy cp = new ClientPolicy();
cp.rackId = 1; //Next, changed to 2 and then 3, for testing.
cp.rackAware = true;
AerospikeClient client = new AerospikeClient(cp, "localhost", 3000);
//Run SI query
Statement stmt = new Statement();
stmt.setNamespace("test");
stmt.setSetName("testset");
stmt.setFilter(Filter.range("age", 20,30));
QueryPolicy qp = new QueryPolicy();
//Specify query to use preferred rack
qp.replica = Replica.PREFER_RACK;
RecordSet rs = client.query(qp, stmt);
while(rs.next()){
Record r = rs.getRecord();
Key thisKey = rs.getKey();
System.out.println(r);
}
//Close this client
client.close();
Code output (should be same for each test with different rack-id):
(gen:1),(exp:453789840),(bins:(name:Sean),(age:24))
(gen:1),(exp:453789840),(bins:(name:Jill),(age:20))
(gen:1),(exp:453789840),(bins:(name:Jack),(age:26))
I am using Jupyter Notebook, so I can change the preferred rack Id and validate that I am still getting the correct results and the query is going to a single node by watching the log ticker output as below:
Only the ticker for the preferred rack should bump up in count of long-basic
.
$ sudo cat /var/log/aerospike/aerospike.log |grep si-query
May 14 2024 04:59:03 GMT: INFO (info): (ticker.c:885) {test} si-query: short-basic (0,0,0) long-basic (5,0,0) aggr (0,0,0) udf-bg (0,0,0) ops-bg (0,0,0)
The long-basic (5,0,0) is the number that I am watching on each node. The query is only getting executed on the one node with preferred rack-id.