Possible reasons for error code -8

Hi all

I have 3 node aerospike cluster which is consumed by java application (using Aero client 4.2.3) and suddenly from past 1 week we have started getting socketTimeouts on client side.

Checked that we have set ClientPolicy as following and even tried with setting socketTimeout to 10secs but timeouts are still happening. We actually get -8 error code which as per ResultCode.java means “Server Not accepting Requests”.

please help that what could be the possible reasons and what can I try to resolve it.

Exception trace
Caused by: com.aerospike.client.AerospikeException$Connection: Error -8 from BB9AEFD965C880A 10.xxx.xx.xx 3000: java.net.SocketTimeoutException: connect timed out
at com.aerospike.client.cluster.Connection.<init>(Connection.java:94)
at com.aerospike.client.cluster.Node.getConnection(Node.java:594)
at com.aerospike.client.command.SyncCommand.execute(SyncCommand.java:68)
at com.aerospike.client.AerospikeClient.get(AerospikeClient.java:807)
at com.mmt.flights.cache.aerospike.AerospikeManager.getCommonUtil(AerospikeManager.java:203)
... 34 more\nCaused by: java.net.SocketTimeoutException: connect timed out\n\tat java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at com.aerospike.client.cluster.Connection.<init>(Connection.java:79)
... 38 more

Note: This wasn’t happening from quite a while but suddenly started happening without any code change or cluster change, however traffic is one thing which has increased a bit.

Could you share the code causing this exception?

The Exception is not coming regularly, the same code running quite well for some time and then suddenly it raise the exception and after 4-5 failures it again starts working smoothly.

The exception came while putting packet in Aerospike from line.

        cache.multiCREATEorUPDATE(key, bins, expiry);

Below is the Config and Code from where the exception is coming.

######## Config #########

namespace: ns_128
    set: supply-book-state
    hostAndPort: m_flt_aspk-rms_1.mmt.mmt:3000,m_flt_aspk-rms_2.mmt.mmt:3000,m_flt_aspk-rms_3.mmt.mmt:3000,m_flt_aspk-rms_4.mmt.mmt:3000,m_flt_aspk-rms_5.mmt.mmt:3000
    readPolicySocketTimeout: 3000
    readPolicyTotalTimeout: 10000
    readPolicySleepBetweenRetries: 10
    writePolicySocketTimeout: 3000
    writePolicyTotalTimeout: 10000
    writePolicySleepBetweenRetries: 10
    expiryInSeconds: 7200
    fallbackEnabled: false
    writePolicyMaxRetries: 2

######## Code #########

@Component
public class AerospikeBookStatusManager {

    private Cache cache;

    private static final String SUPPLIER_BIN = "supplier";

    @Autowired
    private AerospikeProps aerospikeProps;

    @PostConstruct
    public void init() {
        try {
            this.cache = CacheManager.getInstance(createAerospikeConfig(), CacheType.AEROSPIKE);
        }
        catch (Exception e){
            MMTLogger.error("CACHE-ERROR", "Aerospike init failed", AerospikeBookStatusManager.class.getName(),
                    e);
            throw new CacheException("Aerospike Initialization failed", e);
        }
    }

    private CacheConfig createAerospikeConfig() {
        AerospikeConfig aerospikeConfig = new AerospikeConfig();
        aerospikeConfig.setHostAndPort(aerospikeProps.getHostAndPort());
        aerospikeConfig.setNamespace(aerospikeProps.getNamespace());
        aerospikeConfig.setSet(aerospikeProps.getSet());
        aerospikeConfig.setReadPolicySleepBetweenRetries(aerospikeProps.getReadPolicySleepBetweenRetries());
        aerospikeConfig.setReadPolicySocketTimeout(aerospikeProps.getReadPolicySocketTimeout());
        aerospikeConfig.setReadPolicyTotalTimeout(aerospikeProps.getReadPolicyTotalTimeout());
        aerospikeConfig.setWritePolicySleepBetweenRetries(aerospikeProps.getWritePolicySleepBetweenRetries());
        aerospikeConfig.setWritePolicySocketTimeout(aerospikeProps.getWritePolicySocketTimeout());
        aerospikeConfig.setWritePolicyTotalTimeout(aerospikeProps.getWritePolicyTotalTimeout());
        aerospikeConfig.setFallbackEnabled(aerospikeProps.getFallbackEnabled());
        aerospikeConfig.setWritePolicyMaxRetries(aerospikeProps.getWritePolicyMaxRetries());
        aerospikeConfig.setEnableCompression(true);
        aerospikeConfig.setType(CompressionType.LZ4);
        return aerospikeConfig;
    }

    public void put(String key, String value, String binName, int expiry) throws CacheException {
        Map<String, byte[]> bins = new HashMap<>();
        bins.put(binName, value.getBytes());
        bins.put(SUPPLIER_BIN, CommonConstants.SERVICE_NAME.getBytes());
        cache.multiCREATEorUPDATE(key, bins, expiry);
    }

And just to add to above - we are not just seeing errors while create call, but UPDATE, GET also timing out

Could you also run this command and provide the output from each server running Aerospike DB.

cat /proc/$(pidof asd)/limits

Sorry, missed this part of the exception:

Caused by: java.net.SocketTimeoutException: connect timed out

In this particular case, the error was caused by the connection timing out while trying to reach the server. There are many reasons for timeouts, latency between client and server, client or server overloaded, network problems, etc.

The client already sends a timeout along with the command to the server. Also, setting a timeout directly on the server would be global, while the client timeout can be changed for each transaction.

Your Java socket shows SocketTimeoutException means that it takes too long to get respond from other device and your request expires before getting response. This exception is occurring on following condition.

  • Server is slow and default timeout is less, so just put timeout value according to you.
  • Server is working fine but timeout value is for less time. so change the timeout value.

Solution: A java developer can pre-set the timeout option for both client and server operations.

From Client side:

Socket socket = new Socket();
SocketAddress socketAddress = new InetSocketAddress(host, port);
socket.connect(socketAddress, 12000); //12000 are milli seconds

From Server side:

ServerSocket serverSocket = new new ServerSocket(port);
serverSocket.setSoTimeout(12000);

Aerospike actually has its own policy that lets the application developer configure the different timeouts. The following article should help understand some of those details: Understanding Timeout and Retry policies.