Connections stuck in CLOSE_WAIT

cody · February 13, 2015, 5:11pm

I’m seeing a situation where over a short period of time connections count ramps up dramatically, and the connections seem to be stuck in CLOSE_WAIT

Here’s a sample timeline from graphite of [connections, timestamp]

[871.0, 1423801310], [871.0, 1423801312], [871.0, 1423801314], [4094.0, 1423801316], [4094.0, 1423801318], [4094.0, 1423801320], [4094.0, 1423801322], [4094.0, 1423801324], [4094.0, 1423801326], [4094.0, 1423801328], [4094.0, 1423801330], [4094.0, 1423801332], [4094.0, 1423801334], [4094.0, 1423801336], [4094.0, 1423801338], [4094.0, 1423801340], [4094.0, 1423801342], [4094.0, 1423801344], [18460.0, 1423801346]

The vast majority of those 18,000 connections were stuck in CLOSE_WAIT on the server side:

asd 9962 root 206u IPv4 3552341 0t0 TCP ip-XXX.ec2.XXX:3000->ip-XX.ec2.XXX:57412 (CLOSE_WAIT)

This has intermittently happened several times, and I was only able to resolve it by hup’ing asd.

For what it’s worth, this is with the PHP client library.

I see the following presentation recommending setting proto-fd-idle-ms to 10 seconds for php, but even that wouldn’t have helped in this situation, because the connection limit was hit in the space of a couple of seconds.

rbotzer · February 13, 2015, 5:55pm

Hey Cody,

Thanks for reporting the problem. I’d like to dig into this, and first I want to try and reproduce it consistently.

Can you add information about which release of the PHP client are you using, the OS and version on the client side, and the version of the server?

Are your scripts using persistent connections? Is this a webserver context (for example Nginx + PHP-FPM)? If so could you give the webserver configuration?

Are you using shared memory for cluster tending? Could you specify what the aerospike.shm.* values are in your php.ini?

Thanks! Ronen

cody · February 13, 2015, 6:58pm

aerospike php client version 3.3.9
Amazon Linux AMI release 2014.09 3.14.27-25.47.amzn1.x86_64
aerospike-server-community-3.4.1-el6

It looks like Aerospike::__construct does a persistent connection by default, which is what we're using.
Yes, Nginx + PHP-FPM
nginx-1.6.2-1.22.amzn1.x86_64

Differences from nginx base config:
worker_rlimit_nofile 200000;
tcp_nopush     on;
keepalive_timeout  0;
open_file_cache max=200000 inactive=20s;
open_file_cache_valid 30s;
open_file_cache_min_uses 2;
open_file_cache_errors on;
gzip on;
fastcgi_buffers 8 16k;
fastcgi_buffer_size 32k;


cat /etc/php.d/aerospike.ini:

extension=aerospike.so
aerospike.udf.lua_system_path=/opt/aerospike/client-php/sys-lua
aerospike.udf.lua_user_path=/opt/aerospike/client-php/usr-lua

rbotzer · February 13, 2015, 7:29pm

Alrighty. I’ll look into it.

Can you please try to explicitly set a true second parameter to the constructor. If that doesn’t change things, can you try to make use of the shared-memory cluster tending?

You would add to your php.ini:

aerospike.shm.use=true

Ronen

rbotzer · February 16, 2015, 9:05pm

One more thing, how many PHP processes do you have at the point where you see 18k CLOSE_WAITs? Can you quote your php-fpm.conf?

cody · February 17, 2015, 3:46pm

The issue seems to happen at low load times of the day (ie middle of the night), so approx 150 php processes across all fronted machines.

There was a very noticeable load difference switching to explicit false for persistent connections, so it seems fairly likely they were on. We’ll try shared memory cluster tending.

Relevant portions of php-fm config are:

pm = dynamic
pm.max_children = 150
pm.max_requests = 500
pm.max_spare_servers = 50
pm.min_spare_servers = 10
pm.start_servers = 30

rbotzer · February 17, 2015, 4:01pm

Thanks for the details. We’re looking into it.

vlad · June 12, 2015, 5:53pm

Have exactly the same issue with Java client too. The Aerospike server is practically idle (waiting for production deployment), Aerospike client webapp is started but not getting any traffic. Server version 3.5.12, java client 3.1.2

Please let me know if you need more info or want me to make any configuration changes.

rbotzer · July 15, 2015, 11:50pm

Two separate issues here. If the server side is idle it may be reaping the connections due to under-utilization you will see CLOSE_WAITs, but I don’t think it’ll be in the thousands, and if the client isn’t doing much with the server then it’s not really a problem.

It’s a relatively expensive operation (in terms of CPU and time) to initialize the client. It needs to learn the cluster topology after it connects to the seed node, then open TCP connections to each of the newly discovered nodes. In most clients (Java, Go, C#, etc) we do this once, then hold onto the client and send all requests through it.

PHP, Python, and Ruby usually approach web applications in a different way. Since traditionally the problem was memory leaks in the interpreted code, the ‘solution’ was to severely limit the number of request each process should handle, fork new ones, and kill the ones that maxed out their requests.

I would first suggest to raise the max_requests value as much as possible, while monitoring the processes for their memory consumption. The less the aerospike object gets recycled (and with it opening and closing connections), the less this will occur.

We intend to investigate this further, however. I’d rather it worked better with a ‘standard’ FPM configuration.

rbotzer · July 16, 2015, 12:28am

@cody is that PHP non-ZTS? Check if path the extension_dir is something like /usr/local/lib/php/extensions/no-debug-non-zts-20131226.

cody · July 17, 2015, 4:35pm

@rbotzer Yeah, it’s /usr/lib/php/extensions/no-debug-non-zts-20131226/

rbotzer · December 25, 2015, 9:59am

Please try the new client release 3.4.6 and read the Configuration in a Web Server Context section of the overview.

Topic		Replies	Views
Java client - tendThread leaves connections in CLOSE_WAIT state Java Client	6	4367	June 1, 2017
Discrepancy between FPM connections and Aerospike connection PHP Client Library	6	2193	April 26, 2019
Most efficient connection handling? PHP Client Library	2	3645	June 10, 2016
Numerous timeout exceptions with PHP Client PHP Client Library	15	2717	March 6, 2017
Aerospike server got very high CPU usage when using with the php-client PHP Client Library	15	6884	December 26, 2015

Connections stuck in CLOSE_WAIT

Related topics