We have the error: Client timeout: timeout=0 iterations=2 failedNodes=0 failedConns=2
DEBUG: Node *** ...:3000: Error Code 11: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
The error occurs when reading from secondaries index in a long time (> 5-10 minituse). At this time, the CPU load increases about 15-20%.
Cluster has 13 namespaces, 12 indexes (2 text, 10 numeric).
Client: C#/.NET on Windows Server 2088 R2, 11 clients. Server: Centos 6, AS 3.5.4, 3 nodes.
We do not see any low errors on the client or on the server (like message, aerospike.log, other log and limits…).
Another possibility is that your network capacity has been exceeded, one way to check this is by running:
sar -n DEV
The output is in Kibibytes and your network is probably limited to some number of Gibibits so you will need to do the conversion to see if you have exceeded your networks limit.
var statement = new Statement();
statement.SetNamespace(ns);
statement.SetSetName(set);
statement.SetIndexName(index);
statement.SetBinNames(binName);
statement.SetFilters(filter);
var result = new List<T>();
RecordSet rs = _client.Query(null, statement);
while (rs.Next())
{
var instance = CreateInstanceFromRecord(rs.Record); // simple deserialize
if (instance != null)
{
result.Add(instance);
}
}
rs.Close();
return result;
Query runs from start to end without problems. But a simple get-set requests from another applications (from another or not servers) only in this time give errors (even in other namespaces). Essentially, we did everything by man.
I think you are experiencing locking up when long running queries are working with slow clients. Up untill 3.5.8 we do network IO of intermediate result buffers under the object lock, which would block all the concurrent read and write if client is not consuming result at the good rate.
With the advent of Aerospike consistency, this same error is returned if the roster is not set, or some cases where the partition is unavailable. The client realizes it has no node to send to, and thus returns this error.
If you see this error with consistency enabled, make sure to check your roster and your unavailable partitions.