What causes AerospikeException: java.io.EOFException?


#1

#2

Sorry about this inconvenience. I just went ahead and adjusted this settings to allow more links to be posted by new users. I also bumped up your level one up (from new user to basic user).

Thanks for your feedback. Please keep it coming as it will help us improve our forum’s user experience.

Thanks, –meher


#3

Thanks, Meher. What was weird is that I was only trying to paste one link, but it was complaing that I was trying to post 2.


#4

Hi

The java .io.EOFException thrown by a Stream like BufferedInputStream when the stream is empty (EOF).

It is an unusual situation for you to be catching this exception if it is thrown from Aerospike, there may be an issue to be reported. Can you share the version of the Aerospike Java Client and a stacktrace?

Regards Peter


#5

Thanks helipilot50. Here you go.

com.aerospike.client.AerospikeException: java.io.EOFException
    at com.aerospike.client.async.SelectorManager.processKey(SelectorManager.java:184) [aerospike-client-3.0.24.jar:?]
    at com.aerospike.client.async.SelectorManager.runCommands(SelectorManager.java:108) [aerospike-client-3.0.24.jar:?]
    at com.aerospike.client.async.SelectorManager.run(SelectorManager.java:69) [aerospike-client-3.0.24.jar:?]
Caused by: java.io.EOFException
    at com.aerospike.client.async.AsyncConnection.read(AsyncConnection.java:127) ~[aerospike-client-3.0.24.jar:?]
    at com.aerospike.client.async.AsyncSingleCommand.read(AsyncSingleCommand.java:48) ~[aerospike-client-3.0.24.jar:?]
    at com.aerospike.client.async.SelectorManager.processKey(SelectorManager.java:164) ~[aerospike-client-3.0.24.jar:?]
    ... 2 more

#6

I was testing on a new cluster when this happened. One thing I just noticed is that the eviction policy was much smaller (30d in the config, which I’m guessing might actually translate to 30 seconds?) than I wanted it to be (0 - never evict). Could it be that the records were being evicted while they were being read and this was the result? I’ve changed the eviction policy to 0 and will see if this happens again.


#7

Can you check that you haven’t called client.close() before you do the async read. If you haven’t, I’ll report this as a defect.


#8

Yes, I can confirm that I am not calling close() anywhere in my code. I have only one client running in the jvm. These errors happened sporadically. I also got successful reads after the exception (w/out opening or closing the client).


#9

I have reported this issue as a defect with Aerospike


#10

EOFException is thrown when the socket connection is no longer valid. It usually happens because the server (not client) has closed the connection. Here is the relevant read code in the client.

 /**
  * Read till byteBuffer limit reached or received would-block.
  */
public boolean read(ByteBuffer byteBuffer) throws IOException {
    while (byteBuffer.hasRemaining()) {
        int len = socketChannel.read(byteBuffer);

        if (len == 0) {         
            // Got would-block.
            return false;
        }

        if (len < 0) {
            // Server has shutdown socket.
            throw new EOFException();
        }
    }
    return true;
}

EOFException with async client of nioEventLoop
#11

Hi,

I’m also seeing exceptions like this, but unusually I’m only seeing it in a specific environment.

We are running clusters of Aerospike in AWS and GCE, with a process that runs daily, writing around 15 million rows. We’re using the latest version of the Java client (3.0.35) and Aerospike CE (3.5.3), and async writes without issuing any client.close() calls.

On GCE, when first writing to the cluster a few writes succeed, then we see up to 15 minutes of EOFExceptions before suddenly all writes start succeeding without any influence from us or any log messages in Aerospike or the client. Even with retries, we still fail to write a significant number of rows.

We’re using single-bin, and no TTL.


#12

This issue is definitely on the server side and looks to be happening somewhere in the network layer. I’ve just run through another scenario with loading data and observed that all of the fabric connections to the other AS nodes in the cluster were building up an increasing set of data to send:

netstat -anp | grep 3001
tcp        0      0 0.0.0.0:3001            0.0.0.0:*               LISTEN      12927/asd       
tcp        0      0 10.240.108.242:3001     10.240.117.250:48091    ESTABLISHED 12927/asd       
tcp        0      0 10.240.108.242:3001     10.240.102.76:54723     ESTABLISHED 12927/asd       
tcp        0  17999 10.240.108.242:37802    10.240.117.250:3001     ESTABLISHED 12927/asd       
tcp        0      0 10.240.108.242:3001     10.240.40.168:34012     ESTABLISHED 12927/asd       
tcp        0      0 10.240.108.242:3001     10.240.40.168:33932     ESTABLISHED 12927/asd       
tcp        0      0 10.240.108.242:3001     10.240.95.202:47079     ESTABLISHED 12927/asd       
tcp        0  18151 10.240.108.242:37804    10.240.117.250:3001     ESTABLISHED 12927/asd       
tcp        0  17012 10.240.108.242:46599    10.240.40.168:3001      ESTABLISHED 12927/asd       
tcp        0  17961 10.240.108.242:52978    10.240.102.76:3001      ESTABLISHED 12927/asd       
tcp        0      0 10.240.108.242:3001     10.240.117.250:48095    ESTABLISHED 12927/asd       
tcp        0  16800 10.240.108.242:46592    10.240.40.168:3001      ESTABLISHED 12927/asd       
tcp        0  18076 10.240.108.242:52967    10.240.102.76:3001      ESTABLISHED 12927/asd       
tcp        0      0 10.240.108.242:3000     10.240.135.136:43001    TIME_WAIT   -               
tcp        0  18002 10.240.108.242:37812    10.240.117.250:3001     ESTABLISHED 12927/asd       
tcp        0      0 10.240.108.242:3001     10.240.95.202:47056     ESTABLISHED 12927/asd       
tcp        0      0 10.240.108.242:3001     10.240.95.202:47068     ESTABLISHED 12927/asd       
tcp        0  16952 10.240.108.242:46603    10.240.40.168:3001      ESTABLISHED 12927/asd       
tcp        0      0 10.240.108.242:3001     10.240.117.250:48080    ESTABLISHED 12927/asd       
tcp        0  17029 10.240.108.242:46612    10.240.40.168:3001      ESTABLISHED 12927/asd       
tcp        0  16641 10.240.108.242:48412    10.240.95.202:3001      ESTABLISHED 12927/asd       
tcp        0      0 10.240.108.242:3001     10.240.95.202:47057     ESTABLISHED 12927/asd       
tcp        0      0 10.240.108.242:3001     10.240.95.202:47077     ESTABLISHED 12927/asd       
tcp        0      0 10.240.108.242:3001     10.240.102.76:54716     ESTABLISHED 12927/asd       
tcp        0  18003 10.240.108.242:37811    10.240.117.250:3001     ESTABLISHED 12927/asd       
tcp        0      0 10.240.108.242:3001     10.240.40.168:33999     ESTABLISHED 12927/asd       
tcp        0  18211 10.240.108.242:52972    10.240.102.76:3001      ESTABLISHED 12927/asd       
tcp        0  17136 10.240.108.242:46598    10.240.40.168:3001      ESTABLISHED 12927/asd       
tcp        0  17922 10.240.108.242:52987    10.240.102.76:3001      ESTABLISHED 12927/asd       
tcp        0      0 10.240.108.242:3001     10.240.117.250:48081    ESTABLISHED 12927/asd       
tcp        0  16844 10.240.108.242:48428    10.240.95.202:3001      ESTABLISHED 12927/asd       
tcp        0  18242 10.240.108.242:37823    10.240.117.250:3001     ESTABLISHED 12927/asd       
tcp        0      0 10.240.108.242:3001     10.240.102.76:54732     ESTABLISHED 12927/asd       
tcp        0  16761 10.240.108.242:48413    10.240.95.202:3001      ESTABLISHED 12927/asd       
tcp        0      0 10.240.108.242:3001     10.240.102.76:54707     ESTABLISHED 12927/asd       
tcp        0  18009 10.240.108.242:52979    10.240.102.76:3001      ESTABLISHED 12927/asd       
tcp        0      0 10.240.108.242:3001     10.240.117.250:48072    ESTABLISHED 12927/asd       
tcp        0  17848 10.240.108.242:52983    10.240.102.76:3001      ESTABLISHED 12927/asd       
tcp        0      0 10.240.108.242:3001     10.240.40.168:33926     ESTABLISHED 12927/asd       
tcp        0  16705 10.240.108.242:48437    10.240.95.202:3001      ESTABLISHED 12927/asd       

I observed the send buffers continually increased on all nodes while exceptions were being thrown in the client trying to write to the cluster. After some arbitrary point in time new connections are established and errors stop.

My suspicion is that there is no communication over fabric channels unless there’s read/write activity in the cluster. In an idle state nothing is sending keepalives or any data over 3001 to other nodes. In GCE there are processes which cull idle tcp connections, but they operate outside of the virtual machine. It may be that Aerospike has no idea that the connections have been severed at the point where it starts sending data to other nodes.


#13

To add on to Brian’s previous reply, aerospike server will automatically shutdown connections which have been idle for more than 1 minute (configuration parameter = proto-fd-idle-ms).

So if the client has not had active requests over the cached connections, this will be a very likely situation where EOFException can happen.


#14

Hi @helipilot50 Any update on this defect. I am facing the same issue…

com.aerospike.client.AerospikeException: java.io.EOFException
        at com.aerospike.client.async.SelectorManager.processKey(SelectorManager.java:188)
        at com.aerospike.client.async.SelectorManager.runCommands(SelectorManager.java:107)
        at com.aerospike.client.async.SelectorManager.run(SelectorManager.java:68)
Caused by: java.io.EOFException
        at com.aerospike.client.async.AsyncConnection.read(AsyncConnection.java:146)
        at com.aerospike.client.async.AsyncMultiCommand.read(AsyncMultiCommand.java:95)
        at com.aerospike.client.async.SelectorManager.processKey(SelectorManager.java:165) 

Thanks in advance


#15

I ended up just handling it in my application.