Exception on node restart


#1

Hi, I am using Aerospike 3.15.0.1 with Java 4.0.8 client. Whenever I restart a node from 3 node cluster, I am getting below exception and the request is not routed to other node. I have configured all 3 nodes IPs at client side.

com.aerospike.client.AerospikeException: java.io.EOFException
Caused by: java.io.EOFException
com.aerospike.client.AerospikeException: Error Code 11: Server not available

My namespace configuration is

namespace dm {
        replication-factor 2
        memory-size 5G
        default-ttl 1d # 1 days, use 0 to never expire/evict.
        storage-engine memory
}

Please help.


#2

what is your client side code for opening connections?


#3

Gupta, Appreciate your quick reply. Below is my code snippet

clientPolicy.maxConnsPerNode = 100;
clientPolicy.maxSocketIdle = 60;
clientPolicy.timeout = 5000;

client = new AerospikeClient(clientPolicy, hosts);
writePolicy = new WritePolicy();
try{
    writePolicy.expiration = 300;
    writePolicy.recordExistsAction = RecordExistsAction.CREATE_ONLY;
    bin1 = new Bin(null, k);
    client.put(writePolicy, key, bin1);
  }catch(AerospikeException e){
      if(e.getResultCode() == ResultCode.KEY_EXISTS_ERROR){
          writePolicy.expiration = -2;
          writePolicy.recordExistsAction = RecordExistsAction.UPDATE_ONLY;  
          bin1 = new Bin(null, 1);
          Record record = client.operate(writePolicy, key, Operation.add(bin1), Operation.get());
      }
}

#4

I would recommend maxSocketIdle to 55 seconds instead of 60. Server side proto-fd-idle-max is 60000 ms or 60 seconds. what is your “hosts” variable? can you mask ip address partially and share? 192.x.y.z… etc


#5

Gupta,

Below are the hosts

String aeroSpikeHost = "202.xxx.xxx.100,202.xxx.xxx.101,202.xxx.xxx.102"; // Arriving as parameter to this function
String aeroSpikePort = "3000,3000,3000"; // Arriving as parameter to this function
String hostsStr[] = aeroSpikeHost.split(",");
String portStr[] = aeroSpikePort.split(",");
Host[] hosts = new Host[hostsStr.length];
for (int i = 0; i < hosts.length; i++) {
    hosts[i] = new Host(hostsStr[i], Integer.parseInt(portStr[i]));
}

#6

are you still getting the issue with maxSocketIldle at 55?


#7

Gupta, Thanks for your quick reply. We are rolling out a release tonight with this change. Will reply once get tested.


#8

Gupta,

Below is my service configuration. Can I retain proto-fd-max to 15000?

service {
    paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
    proto-fd-max 15000
    proto-fd-idle-max 60000
    transaction-pending-limit 0
    batch-max-requests 50000
}

#9

Gupta,

Aerospike throws exception and not getting started saying parameter proto-fd-idle-max is invalid. Instead I set proto-fd-idle-ms 60000. But still Im getting com.aerospike.client.AerospikeException: Error Code 11: Server not available

Please suggest.


#10

proto-fd-idle-ms is what I should have said. i mistyped. that is 60000 and maxSocketIdle should be 55. ie socket should be closed (connection reaping) by client before server closes it. setting socket idle 5 secs less than server side idle will ensure that.

Error Code 11 states: Server is not accepting requests. Occur during single node on a quick restart to join existing cluster.


#11

Gupta, This is occurring even after setting suggested properties while restarting a node on 3 node cluster. I am using community edition. Hence I don’t think I will be having Fast restart option. Please help.


#12

Once your client is connected to the cluster, one node going down should not matter to your connection to the other nodes. So I am not sure what is going on. Perhaps you can better explain the sequence of all events when you get the server error and someone else may chime in. With CE you don’t have access to Aerospike support otherwise they can troubleshoot using info collected from your cluster.


#13

btw - what do you see for

$ grep CLUSTER-SIZE /var/log/aerospike/aerospike.log

Do you see CLUSTER-SIZE = 3?