Idle server connections reaping
The proto-fd-idle-ms controls the time a client connection can stay idle before being reaped by the server. The client libraries also have a configuration for closing idle connections (
maxSocketIdle for the Java Client Library for example). The client side should be configured to a few seconds below the server side (defaults are 55 seconds on the client and 60 seconds on the server) which will prevent race conditions for a connection to be used by a client while the server has just initiated closing it (causing a TCP RST).
It is detrimental to have unused and idle connections on the server (without a client counterpart) accrue as it would eventually breached the configured proto-fd-max and prevent new connections from being established.
Aerospike version 18.104.22.168 introduces a keep alive mechanism to prevent such connections of accruing on the server side:
- [AER-6141] - (KVS) Use keep-alive for client sockets.
This article covers a couple of specific situations relevant to how connections are handled after the client side goes down and the impact on Aerospike Server versions prior to 4.8.
1. Client machine gets stopped gracefully (normal shutdown) or only the client process is killed on a running client machine.
This would result in all connections to that client getting closed as part of a normal tcp termination handshake.
2. Client machine is killed un-gracefully. (ie: server or VM crash, power outage, etc…)
In the event of an ungraceful client shutdown there are two possibilities in terms of closing connections:
a. Connections were idle on the server and waiting for client to send a request.
In Aerospike Server version 4.7 and earlier, these connections will stay alive and be subject to keep-alive kernel policies and proto-fd-idle-ms.
Here are the 3 keep-alive kernel policy settings being referred to:
- tcp_keepalive_time : keep-alives start the number of seconds defined by this setting and after the last activity on the TCP connection
- tcp_keepalive_intvl : After a connection has been idle for tcp_keepalive_time, a keep-alive probe is sent every tcp_keepalive_intvl seconds.
- tcp_keepalive_probes : If tcp_keepalive_probes number of the keep-alive probes remain unanswered, the connection is pronounced dead and gets closed.
b. Connections had server responses pending to the client. Transaction was being processed on the server side and was about to be sent to the client when it went offline un-gracefully.
This would lead to TCP retransmits and be subject to tcp_retries2 (Limit on tcp retransmits before a connection timeout) kernel policy.
Connections may stay in the Established state until either tcp_retries2 is exhausted or proto-fd-idle-ms.
Lets go through a working example where a client VM is killed and had 106 idle connections (nothing pending from server side) and 496 connections pending server responses to the client (for a total of 612 connections).
At this point, 496 connections have outstanding client requests, i.e., these connections will see a server response after the VM has been killed. 106 connections don’t.
The 106 connections remain completely silent after the VM goes away. This isn’t surprising, as it would now be the client’s turn to send a request on these connections.
But the client is gone. So, this is the typical case of an idle connection which is subject to the keep-alive settings.
At some point the server sends a response on each of the 496 connections. However, the client is gone by this time, so it doesn’t ACK the server’s response.
This makes the server’s TCP stack re-transmit the response - after 200 ms, 400 ms, 800 ms, etc. The delay keeps doubling until it reaches 120,000 ms. After that it stays at 120,000 ms.
The number of re-transmissions of non-Acked packet is capped by tcp_retries2 and uses an exponential backoff timeout. Each retransmission timeout is between TCP_RTO_MIN (200 ms) and TCP_RTO_MAX (120 seconds – hardcoded in linux)
In our case, tcp_retries2 is set to 15, so it keeps re-transmitting for a total of 200 ms + 400 ms + … + 102,400 ms + 120,000 ms (cap reached) + … + 120,000 ms = ~13.5 minutes. After that it would time out.
The 496 connections that have outstanding client requests (on which the server thus sends a response after the VM has gone away) time out after ~13.5 minutes.
The 106 connections that do not have outstanding client requests (which thus are completely silent / idle after the VM goes away) are subject to the kernel keep-alive settings.
As long as there is outgoing data in a socket’s send buffer, the keep-alive settings don’t apply.
This explains why different connections behave differently. Some have outstanding responses. Those go away first, after the number of tcp retries (governed by tcp_retries2 setting) has been exhausted.
The connections that don’t have outstanding responses are subject to the keep-alive settings.
Which ever threshold is breached first (proto-fd-idle-ms or tcp_retries2) will determine the timing of the reaping of connections.
Modifying Kernel settings:
This can be done dynamically using sysctl:
sudo sysctl -w net.ipv4.tcp_keepalive_time = 60 sudo sysctl -w net.ipv4.tcp_keepalive_intvl = 10 sudo sysctl -w net.ipv4.tcp_keepalive_probes = 9 sudo sysctl -w net.ipv4.tcp_retries2 = 3
or statically in the respective files under
proto-fd-ms-idle connection reaping closing tcp tcp_retries2