Increasing amount of i/o timeouts with last Aerospike Go client


#1

Hi all,

first, congrats for the work achieved on Aerospike, it’s an amazing tech and we are pretty happy with it.

We are using Aerospike for more than a year now, without issues.

Since we updated the Go client library to its last version we got more and more i/o timeout errors: read tcp 10.0.0.x:56026->w.x.y.z:3000: i/o timeout

First I increased the timeouts and retries configuration of the Go client: client.DefaultWritePolicy.SocketTimeout = 30*time.Second

client.DefaultWritePolicy.MaxRetries = 1000

client.DefaultWritePolicy.Timeout = 30*time.Second

client.DefaultPolicy.SocketTimeout = 30*time.Second

client.DefaultPolicy.MaxRetries = 1000

client.DefaultPolicy.Timeout = 30*time.Second

I was able to get rid of most of the timeouts with this configuration, but it still seems to happen sometimes. I am on a pretty good internet connexion within the US to an amazon ec2 instance located in Virginia, so I don’t understand where is the weak point here. I also deployed the very last version of Aerospike v4.3.12 but it didn’t fix the issue.

Do you know how can I configure the new Aerospike Go client in order to make it more resilient ?

Let me know, best, Romain.


#2

Up! Anyone can help?


#3

Hi Romain, how do you know it’s the new Go client? Did you revert to the older version to see if the issue is really related to the upgrade?

How do you know the connection between the clients and server nodes are good? Are you monitoring your connections? What Amazon instances are you using? Are the clients and server in the same Amazon data center?

Thanks,


#4

I just updated the Go client, in the same development environment and started to get those timeouts. Nothing has changed but the Go lib, so yes I assume it comes from the lib itself. In one year of use, I never saw those timeouts and I use 3 different servers on Amazon (medium instances) in Virginia and my client runs from Texas, so ping is not an issue here.


#5

How many connections do you have in your queue? Have you tried to tweak the ClientPolicy.ConnectionQueueSize?

If the transactions are too slow, it is possible that you end up running out of connection in the queue. You may be able to get some insight into what’s happening by looking at Client.Stats() values.


#6

This is my client stats: tends-failed:0 connections-attempts:1 connections-successful:1 tends-total:2 tends-successful:2 node-added-count:1 node-removed-count:0 connections-failed:0 connections-pool-empty:0 open-connections:1 partition-map-updates:1] cluster-aggregated-stats:map[connections-successful:1 connections-failed:0 open-connections:1 tends-failed:0 node-added-count:1 connections-attempts:1 tends-total:2 tends-successful:2 partition-map-updates:1 node-removed-count:0 connections-pool-empty:0] open-connections:1

FYI there’s no ClientPolicy.ConnectionQueueSize parameter in the newest Aerospike Go client.