Reads served only from some nodes while using eventloop

Karan_Kumar · December 10, 2020, 8:01am

I was trying to compare Aerospike’s java client with synchronous and asynchronous operations. The aerospike cluster’s size was 9 nodes.

I noticed that with synchronous operations, each server node was serving equal number of reads. Whereas, with asynchronous operations(using the eventloop), only 2 nodes were serving all the requests while the other 7 served none. The replication factor is 2.

I kept the same Client policy in both the situations except for the policies related to size of eventloop and maxConnsPerNode. Can someone point out the possible reason why this happens?

neelp · December 14, 2020, 3:15am

The behavior you describe is very peculiar and should not happen. “Only 2 nodes were serving all the requests while the other 7 served none” - this seems like the two nodes are proxying to the other nodes as described here. However obviously sync and async clients should not be different in this manner. Can you describe the request policy details for the two, and also output of ‘show latency’ (or ‘show latencies’ depending on the server version)? What is the number of cores on the client machine, number of eventloops, and maxconnections?

Karan_Kumar · December 14, 2020, 7:46pm

Client machine has 8 cores and the size of the event loop is 16 and maxConnsPerNode is 16*100. Latencies are always less than 1ms.

For the sync client, maxConnsPerNode is default and the following are same for both sync and async clients:

clientPolicy.readPolicyDefault.replica = Replica.MASTER_PROLES; clientPolicy.readPolicyDefault.consistencyLevel = ConsistencyLevel.CONSISTENCY_ONE; clientPolicy.requestProleReplicas = true;

neelp · December 14, 2020, 8:30pm

Just to clarify: With the async client, were all the requests directed to and serviced by just two nodes OR only two nodes were active and responding to requests they received (and the remaining requests that would otherwise be serviced by other 7 nodes failed)?

Karan_Kumar · December 14, 2020, 9:10pm

All cluster nodes were active. I even stopped the two nodes which were serving all the reads and then two other nodes started serving all the reads.

meher · December 15, 2020, 6:21am

This is weird indeed. Is it the exact same keys that are being accessed through this test and are they all succeeding?

I can see only 2 ways this can happen if all the transactions are successful and the throughput is similar between the 2 tests:

Somehow the data set for the aysnc test is limited to some specific partitions? Like using only 1 or 2 files from a backup (which is partition based). Those partitions would then be held by 2 nodes, and shutting those 2 nodes would cause those partitions to move.
Somehow for the async tests, only 2 nodes are seen and they end up proxying the transactions. But that is hard to imagine happening with the recent clients that should throw a -3 error (node not found for partition).

How are you checking the transactions on the server side? Can you maybe share the output of show latency or show latencies from asadm for both runs?

Also, for the async run, would be interesting to capture the following stats at 2 intervals during the run, so we can see how stats evolve: asadm -e "show stat like client_"

system · December 15, 2021, 6:21am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Large queue in only one cluster node	0	982	August 24, 2015
Aerospike routing traffic Operations	16	918	July 24, 2019
AsyncClient (async 3.2.1 version) - getNodes() method returns "Node" already down Java Client	3	1405	May 5, 2016
Client returns Max Retries reached when node is re-joined to the cluster	8	900	April 2, 2024
Multiple aerospike_connect() from 1 host leads to uncontrolled too many connections C Client Library client	11	1106	March 25, 2020

Reads served only from some nodes while using eventloop

Related Topics