Idle server connections reaping
The proto-fd-idle-ms controls the time a client connection can stay idle before being reaped by the server. The client libraries also have a configuration for closing idle connections (
maxSocketIdle for the Java Client Library for example).
Version 4.8 and above:
Aerospike version 188.8.131.52 introduces a keep-alive mechanism for client sockets.
- [AER-6141] - (KVS) Use keep-alive for client sockets.
In Aerospike versions 184.108.40.206 (and above) server can use keep-alive for client sockets. You can now set proto-fd-idle-ms to 0 (no reap), which would stop the server from ever reaping connections and rely on keep-alive. The server would rely on TCP keep-alive (IdleTimeout=60 seconds, ProbeInterval=60 seconds, ProbeCount=2) to detect dead sockets.
In latest clients, for example latest java client 4.4.12 (and above) you can also set maxSocketIdle to 0 (CLIENT-1291).
Lets dive into the details of setting the maxSocketIdle in the client policy.
ClientPolicy maxSocketIdle controls the maximum time a socket is allowed to stay in node connection pools.
If maxSocketIdle == 0:
- The cluster tend thread will trim connections down to the min connections configured using a hard-coded max idle time of 55 seconds.
- The transaction thread connections will not be checked for expiration (no reaping in the path of a transaction).
If maxSocketIdle > 0:
- The cluster tend thread will trim connections down to the configured minimum connections using maxSocketIdle.
- The transaction thread connections will be checked for expiration using maxSocketIdle. This could, in some cases, lead to connection reset errors if a server closed a connection as the client is attempting to use it. This could happen if the server proto-fd-idle-ms is configured lower than maxSocketIdle, or if the tend thread is taking longer than expected to tend all nodes in the cluster.
If the server proto-fd-idle-ms is set to zero, it is a good practice to also set maxSocketIdle to zero. In general, it is suggested to set proto-fd-idle-ms to zero.
The connection is returned back to the pool if the client command finishes with success or for server generated errors. If a client errors during parsing of a response or if a network error occurs, the connection is closed and not put back into the pool.
Version 4.7 and earlier:
The client side should be configured to a few seconds below the server side (defaults are 55 seconds on the client and 60 seconds on the server) which will prevent race conditions for a connection to be used by a client while the server has just initiated closing it (causing a TCP RST).
It is detrimental to have unused and idle connections on the server (without a client counterpart) accrue as it would eventually breached the configured proto-fd-max and prevent new connections from being established.
This section covers a couple of specific situations relevant to how connections are handled after the client side goes down and the impact on Aerospike Server versions 4.7 and earlier:
1. Client machine gets stopped gracefully (normal shutdown) or only the client process is killed on a running client machine.
This would result in all connections to that client getting closed as part of a normal tcp termination handshake.
2. Client machine is killed un-gracefully. (ie: server or VM crash, power outage, etc…)
In the event of an ungraceful client shutdown there are two possibilities in terms of closing connections:
a. Connections were idle on the server and waiting for client to send a request.
In Aerospike Server version 4.7 and earlier, these connections will stay alive and be subject to keep-alive kernel policies and proto-fd-idle-ms.
Here are the 3 keep-alive kernel policy settings being referred to:
- tcp_keepalive_time : keep-alives start the number of seconds defined by this setting and after the last activity on the TCP connection
- tcp_keepalive_intvl : After a connection has been idle for tcp_keepalive_time, a keep-alive probe is sent every tcp_keepalive_intvl seconds.
- tcp_keepalive_probes : If tcp_keepalive_probes number of the keep-alive probes remain unanswered, the connection is pronounced dead and gets closed.
b. Connections had server responses pending to the client. Transaction was being processed on the server side and was about to be sent to the client when it went offline un-gracefully.
This would lead to TCP retransmits and be subject to tcp_retries2 (Limit on tcp retransmits before a connection timeout) kernel policy.
Connections may stay in the Established state until either tcp_retries2 is exhausted or proto-fd-idle-ms.
Working example (Applies to Version 4.7 and earlier):
Lets go through a working example where a client VM is killed and had 106 idle connections (nothing pending from server side) and 496 connections pending server responses to the client (for a total of 612 connections).
At this point, 496 connections have outstanding client requests, i.e., these connections will see a server response after the VM has been killed. 106 connections don’t.
The 106 connections remain completely silent after the VM goes away. This isn’t surprising, as it would now be the client’s turn to send a request on these connections.
But the client is gone. So, this is the typical case of an idle connection which is subject to the keep-alive settings.
At some point the server sends a response on each of the 496 connections. However, the client is gone by this time, so it doesn’t ACK the server’s response.
This makes the server’s TCP stack re-transmit the response - after 200 ms, 400 ms, 800 ms, etc. The delay keeps doubling until it reaches 120,000 ms. After that it stays at 120,000 ms.
The number of re-transmissions of non-Acked packet is capped by tcp_retries2 and uses an exponential backoff timeout. Each retransmission timeout is between TCP_RTO_MIN (200 ms) and TCP_RTO_MAX (120 seconds – hardcoded in linux)
In our case, tcp_retries2 is set to 15, so it keeps re-transmitting for a total of 200 ms + 400 ms + … + 102,400 ms + 120,000 ms (cap reached) + … + 120,000 ms = ~13.5 minutes. After that it would time out.
The 496 connections that have outstanding client requests (on which the server thus sends a response after the VM has gone away) time out after ~13.5 minutes.
The 106 connections that do not have outstanding client requests (which thus are completely silent / idle after the VM goes away) are subject to the kernel keep-alive settings.
As long as there is outgoing data in a socket’s send buffer, the keep-alive settings don’t apply.
This explains why different connections behave differently. Some have outstanding responses. Those go away first, after the number of tcp retries (governed by tcp_retries2 setting) has been exhausted.
The connections that don’t have outstanding responses are subject to the keep-alive settings.
Which ever threshold is breached first (proto-fd-idle-ms or tcp_retries2) will determine the timing of the reaping of connections.
Modifying Kernel settings (Applies to Version 4.7 and earlier):
This can be done dynamically using sysctl:
sudo sysctl -w net.ipv4.tcp_keepalive_time = 60 sudo sysctl -w net.ipv4.tcp_keepalive_intvl = 10 sudo sysctl -w net.ipv4.tcp_keepalive_probes = 9 sudo sysctl -w net.ipv4.tcp_retries2 = 3
or statically in the respective files under
proto-fd-ms-idle connection reaping closing tcp tcp_retries2