Sorry about this inconvenience. I just went ahead and adjusted this settings to allow more links to be posted by new users. I also bumped up your level one up (from new user to basic user).
Thanks for your feedback. Please keep it coming as it will help us improve our forum’s user experience.
Thanks, –meher
Thanks, Meher. What was weird is that I was only trying to paste one link, but it was complaing that I was trying to post 2.
Hi
The java .io.EOFException thrown by a Stream like BufferedInputStream when the stream is empty (EOF).
It is an unusual situation for you to be catching this exception if it is thrown from Aerospike, there may be an issue to be reported. Can you share the version of the Aerospike Java Client and a stacktrace?
Regards Peter
Thanks helipilot50. Here you go.
com.aerospike.client.AerospikeException: java.io.EOFException
at com.aerospike.client.async.SelectorManager.processKey(SelectorManager.java:184) [aerospike-client-3.0.24.jar:?]
at com.aerospike.client.async.SelectorManager.runCommands(SelectorManager.java:108) [aerospike-client-3.0.24.jar:?]
at com.aerospike.client.async.SelectorManager.run(SelectorManager.java:69) [aerospike-client-3.0.24.jar:?]
Caused by: java.io.EOFException
at com.aerospike.client.async.AsyncConnection.read(AsyncConnection.java:127) ~[aerospike-client-3.0.24.jar:?]
at com.aerospike.client.async.AsyncSingleCommand.read(AsyncSingleCommand.java:48) ~[aerospike-client-3.0.24.jar:?]
at com.aerospike.client.async.SelectorManager.processKey(SelectorManager.java:164) ~[aerospike-client-3.0.24.jar:?]
... 2 more
I was testing on a new cluster when this happened. One thing I just noticed is that the eviction policy was much smaller (30d in the config, which I’m guessing might actually translate to 30 seconds?) than I wanted it to be (0 - never evict). Could it be that the records were being evicted while they were being read and this was the result? I’ve changed the eviction policy to 0 and will see if this happens again.
Can you check that you haven’t called client.close() before you do the async read. If you haven’t, I’ll report this as a defect.
Yes, I can confirm that I am not calling close() anywhere in my code. I have only one client running in the jvm. These errors happened sporadically. I also got successful reads after the exception (w/out opening or closing the client).
I have reported this issue as a defect with Aerospike
EOFException is thrown when the socket connection is no longer valid. It usually happens because the server (not client) has closed the connection. Here is the relevant read code in the client.
/**
* Read till byteBuffer limit reached or received would-block.
*/
public boolean read(ByteBuffer byteBuffer) throws IOException {
while (byteBuffer.hasRemaining()) {
int len = socketChannel.read(byteBuffer);
if (len == 0) {
// Got would-block.
return false;
}
if (len < 0) {
// Server has shutdown socket.
throw new EOFException();
}
}
return true;
}
Hi,
I’m also seeing exceptions like this, but unusually I’m only seeing it in a specific environment.
We are running clusters of Aerospike in AWS and GCE, with a process that runs daily, writing around 15 million rows. We’re using the latest version of the Java client (3.0.35) and Aerospike CE (3.5.3), and async writes without issuing any client.close() calls.
On GCE, when first writing to the cluster a few writes succeed, then we see up to 15 minutes of EOFExceptions before suddenly all writes start succeeding without any influence from us or any log messages in Aerospike or the client. Even with retries, we still fail to write a significant number of rows.
We’re using single-bin, and no TTL.
This issue is definitely on the server side and looks to be happening somewhere in the network layer. I’ve just run through another scenario with loading data and observed that all of the fabric connections to the other AS nodes in the cluster were building up an increasing set of data to send:
netstat -anp | grep 3001
tcp 0 0 0.0.0.0:3001 0.0.0.0:* LISTEN 12927/asd
tcp 0 0 10.240.108.242:3001 10.240.117.250:48091 ESTABLISHED 12927/asd
tcp 0 0 10.240.108.242:3001 10.240.102.76:54723 ESTABLISHED 12927/asd
tcp 0 17999 10.240.108.242:37802 10.240.117.250:3001 ESTABLISHED 12927/asd
tcp 0 0 10.240.108.242:3001 10.240.40.168:34012 ESTABLISHED 12927/asd
tcp 0 0 10.240.108.242:3001 10.240.40.168:33932 ESTABLISHED 12927/asd
tcp 0 0 10.240.108.242:3001 10.240.95.202:47079 ESTABLISHED 12927/asd
tcp 0 18151 10.240.108.242:37804 10.240.117.250:3001 ESTABLISHED 12927/asd
tcp 0 17012 10.240.108.242:46599 10.240.40.168:3001 ESTABLISHED 12927/asd
tcp 0 17961 10.240.108.242:52978 10.240.102.76:3001 ESTABLISHED 12927/asd
tcp 0 0 10.240.108.242:3001 10.240.117.250:48095 ESTABLISHED 12927/asd
tcp 0 16800 10.240.108.242:46592 10.240.40.168:3001 ESTABLISHED 12927/asd
tcp 0 18076 10.240.108.242:52967 10.240.102.76:3001 ESTABLISHED 12927/asd
tcp 0 0 10.240.108.242:3000 10.240.135.136:43001 TIME_WAIT -
tcp 0 18002 10.240.108.242:37812 10.240.117.250:3001 ESTABLISHED 12927/asd
tcp 0 0 10.240.108.242:3001 10.240.95.202:47056 ESTABLISHED 12927/asd
tcp 0 0 10.240.108.242:3001 10.240.95.202:47068 ESTABLISHED 12927/asd
tcp 0 16952 10.240.108.242:46603 10.240.40.168:3001 ESTABLISHED 12927/asd
tcp 0 0 10.240.108.242:3001 10.240.117.250:48080 ESTABLISHED 12927/asd
tcp 0 17029 10.240.108.242:46612 10.240.40.168:3001 ESTABLISHED 12927/asd
tcp 0 16641 10.240.108.242:48412 10.240.95.202:3001 ESTABLISHED 12927/asd
tcp 0 0 10.240.108.242:3001 10.240.95.202:47057 ESTABLISHED 12927/asd
tcp 0 0 10.240.108.242:3001 10.240.95.202:47077 ESTABLISHED 12927/asd
tcp 0 0 10.240.108.242:3001 10.240.102.76:54716 ESTABLISHED 12927/asd
tcp 0 18003 10.240.108.242:37811 10.240.117.250:3001 ESTABLISHED 12927/asd
tcp 0 0 10.240.108.242:3001 10.240.40.168:33999 ESTABLISHED 12927/asd
tcp 0 18211 10.240.108.242:52972 10.240.102.76:3001 ESTABLISHED 12927/asd
tcp 0 17136 10.240.108.242:46598 10.240.40.168:3001 ESTABLISHED 12927/asd
tcp 0 17922 10.240.108.242:52987 10.240.102.76:3001 ESTABLISHED 12927/asd
tcp 0 0 10.240.108.242:3001 10.240.117.250:48081 ESTABLISHED 12927/asd
tcp 0 16844 10.240.108.242:48428 10.240.95.202:3001 ESTABLISHED 12927/asd
tcp 0 18242 10.240.108.242:37823 10.240.117.250:3001 ESTABLISHED 12927/asd
tcp 0 0 10.240.108.242:3001 10.240.102.76:54732 ESTABLISHED 12927/asd
tcp 0 16761 10.240.108.242:48413 10.240.95.202:3001 ESTABLISHED 12927/asd
tcp 0 0 10.240.108.242:3001 10.240.102.76:54707 ESTABLISHED 12927/asd
tcp 0 18009 10.240.108.242:52979 10.240.102.76:3001 ESTABLISHED 12927/asd
tcp 0 0 10.240.108.242:3001 10.240.117.250:48072 ESTABLISHED 12927/asd
tcp 0 17848 10.240.108.242:52983 10.240.102.76:3001 ESTABLISHED 12927/asd
tcp 0 0 10.240.108.242:3001 10.240.40.168:33926 ESTABLISHED 12927/asd
tcp 0 16705 10.240.108.242:48437 10.240.95.202:3001 ESTABLISHED 12927/asd
I observed the send buffers continually increased on all nodes while exceptions were being thrown in the client trying to write to the cluster. After some arbitrary point in time new connections are established and errors stop.
My suspicion is that there is no communication over fabric channels unless there’s read/write activity in the cluster. In an idle state nothing is sending keepalives or any data over 3001 to other nodes. In GCE there are processes which cull idle tcp connections, but they operate outside of the virtual machine. It may be that Aerospike has no idea that the connections have been severed at the point where it starts sending data to other nodes.
To add on to Brian’s previous reply, aerospike server will automatically shutdown connections which have been idle for more than 1 minute (configuration parameter = proto-fd-idle-ms).
So if the client has not had active requests over the cached connections, this will be a very likely situation where EOFException can happen.
Hi @helipilot50 Any update on this defect. I am facing the same issue…
com.aerospike.client.AerospikeException: java.io.EOFException
at com.aerospike.client.async.SelectorManager.processKey(SelectorManager.java:188)
at com.aerospike.client.async.SelectorManager.runCommands(SelectorManager.java:107)
at com.aerospike.client.async.SelectorManager.run(SelectorManager.java:68)
Caused by: java.io.EOFException
at com.aerospike.client.async.AsyncConnection.read(AsyncConnection.java:146)
at com.aerospike.client.async.AsyncMultiCommand.read(AsyncMultiCommand.java:95)
at com.aerospike.client.async.SelectorManager.processKey(SelectorManager.java:165)
Thanks in advance
I ended up just handling it in my application.
How did you achieve it?
send the request again, skip that particular request, sleep for a period and try again… obviously it depends on the application.