Dramatic increase of client_connections/waiting_transactions


#1

Java client is used

We have a two node cluster setup. But we are only connecting via one Server (passing only one IP to the new AereospikeClient() constructor). We will change it soon, but now it is what it is.

We tried a new feature in our application that made a lot of writes (5000 writes per second) to the database. Unfortunately we got a lot of Exceptions:

com.aerospike.client.AerospikeException$Timeout: Client timeout: timeout=0 iterations=3 failedNodes=0 failedConns=2

We removed the feature and confirmed that no writes are performed anymore due to the new feature.

But we still saw a lot of errors: I checked asinfo and saw that the number of client_connections and waiting_transactions increased dramatically.

I understand that there is an additional write when you have replication factor = 2, but is this also the reason for the “waiting transactions” ?

We restarted both nodes and after the second one was back, everything was fine for approx. 15 minutes (waiting_transactions was at 20 at max). But then all of a sudden it went up to over 1.000 and the system seemed to lock itself.

Is there any number ow how many writes you would expect on a two node r3.xlarge amazon cluster?

Thanks for your help. Best regards Roman


#2

The “waiting_transactions” are counts of concurrent requests to the same keys (hot keys). In this case, these transactions have to be serialized on the database side when processing, and thus the late-arriving requests will more likely be timing out.


#3

Hi, thanks for your answer. I found out that the reason was most likely the limit of the network connection, has we stored huge lists in the values for the keys. And once the network device was blocked by transferring that data, the intra cluster communication got slowed down too.