Java client - tendThread leaves connections in CLOSE_WAIT state


#1

Our application (rpc proxy into aerospike db) there’s one com.aerospike.client.AerospikeClient long living connection to the 10 node Aerospike cluster. Now (in dev) the application is almost always idle, but the tendThread in the AerospikeClient is connecting to the cluster every tendInterval (1 sec). After a while (few minutes) the application gets too many open files error and stops responding. I can see increasing number of ESTABLISTED connections to the AS cluster (one node, “master”), that grows to about 650 and stops there (the number grows in increments of 10 = size of the cluster every second). When the number of ESTABLISTED connections reaches ~ 650, there starts to appear CLOSE_WAIT connections and number of them increments in same way. By changing the connection parameters tendInterval, maxConnsPerNode and maxSocketIdle I am able to change the rate of growth of new connections, but the result is always the same - after some time the application hangs with “too many open files” and thousands of CLOSE_WAIT connections.

I also found that doing full sequential scan on a set (thus connecting each node) “heals” the application - all the CLOSE_WAIT connections are gone and number of ESTABLISTED is reduced to ~10 (and then the process starts over…).

Details: java client: com.aerospike:aerospike-client:jar:4.0.0 (maven dependency)

jvm: java version "1.8.0_51" Java™ SE Runtime Environment (build 1.8.0_51-b16) Java HotSpot™ 64-Bit Server VM (build 25.51-b03, mixed mode)

OS: debian wheezy

aerospike server: Aerospike Community Edition build 3.12.1


#2

I’m experiencing the same issue with same Java client and Aerospace Community Edition on a CentOS 6.8 Server. Can someone please comment on this issue?

In my case, when the AS server is idle, I’ve found that the CLOSE_WAIT connections are building up at the rate of 1 per second.

When I downgraded client version to 3.3.4, problem solved.


#3

It sounds like the cluster tend request (1 second interval) is constantly failing. The tend connection for each node is closed/reopened when an error/timeout occurs.

The AerospikeClient log will show tend errors. Can you subscribe to the AerospikeClient log and provide the log results?

  public class MyConsole implements Log.Callback {
      public MyConsole() {
          Log.setLevel(Log.Level.DEBUG);
          Log.setCallback(this);
      }

      @Override
      public void log(Log.Level level, String message) {
          // Write log messages to the appropriate place.
	  String date = SimpleDateFormat.format(new Date());
          System.out.println(date.toString() + ' ' + level.toString() + ' ' + message);
      }
  }

#4

I think I have found the problem. Try modifying Connection.java, isClosed(), and recompile:

	public boolean isClosed() {
                // Old
		// return lastUsed == 0;
                
                // New
		return socket.isClosed();
	}

If “socket.isClosed()” solves your problem, I will create a new release with a more optimized solution (socket.isClosed() acquires a lock which I would like to avoid).


#5

Java client 4.0.1 has been released:

http://www.aerospike.com/download/client/java/4.0.1/

The cluster tend connection should now stay open through successive intervals.


#6

I’ve tried the Java client 4.0.1 and yes, it solves the problem :slight_smile: !

I’ve also subscribed to the log messages (in 4.0.0 version), but there was no logged message.

Thanks for such a quick fix!


#7

Also see: https://stackoverflow.com/questions/44259941/aerospike-invalid-node-exception-when-writing-to-single-node-aerospike