Connection error on a large batch transaction


#1

Problem Description

The C client returns error code -6 when requesting a batch with a large number of records (1000+). It works fine if the batch contains less records (20 or so). The Java client works fine for both scenarios.

Explanation

In general, error code -6 refer to a connection error and can happen under different circumstances. Some common causes for such errors are:

  • A component (stateful firewall for example) between the client and the server closing idle connections that may otherwise still be considered active and usable (in the client pool).
  • A race condition where on high latency links where the server reaps an idle connection right when the client is about to use it. The default setting on the server is 60 seconds (proto-fd-idle-ms).

A less common situation is described in this article. When issuing a large batch transaction, the client library could receive a WouldBlock event. Such events are normal, causing the transmission of packets to stop until the next writable even is received. Under some environment conditions, we have seen such writable event to not be generated following the WouldBlock. This is a bug outside of Aerospike’s client library and causes the client to wait. The server will by default reap the connection (will send a RST packet) after the configured proto-fd-idle-ms which by default is set to 60000 (1 minute). This unexpected packet causes the error code -6 on the client side.

A potential workaround is to avoid receiving the WouldBlock event by changing some socket kernel level settings. Here is the list of relevant settings for this:

$ cat /proc/sys/net/core/rmem_default
$ cat /proc/sys/net/core/rmem_max
$ cat /proc/sys/net/core/wmem_default
$ cat /proc/sys/net/core/wmem_max

Increasing these socket buffer values can be done according to the following link:

We would recommend setting send/receive default/max to 1 MB using the following commands (to run as root on both client and server linux hosts).

sysctl -w net.core.rmem_max=1048576
sysctl -w net.core.wmem_max=1048576
sysctl -w net.core.rmem_default=1048576
sysctl -w net.core.wmem_default=1048576

Keywords

WOULDBLOCK ERROR CODE -6

Timestamp

07/06/2017