Possible reasons for error code -8

Hi all

I have 3 node aerospike cluster which is consumed by java application (using Aero client 4.2.3) and suddenly from past 1 week we have started getting socketTimeouts on client side.

Checked that we have set ClientPolicy as following and even tried with setting socketTimeout to 10secs but timeouts are still happening. We actually get -8 error code which as per ResultCode.java means “Server Not accepting Requests”.

please help that what could be the possible reasons and what can I try to resolve it.

Exception trace
Caused by: com.aerospike.client.AerospikeException$Connection: Error -8 from BB9AEFD965C880A 10.xxx.xx.xx 3000: java.net.SocketTimeoutException: connect timed out
at com.aerospike.client.cluster.Connection.<init>(Connection.java:94)
at com.aerospike.client.cluster.Node.getConnection(Node.java:594)
at com.aerospike.client.command.SyncCommand.execute(SyncCommand.java:68)
at com.aerospike.client.AerospikeClient.get(AerospikeClient.java:807)
at com.mmt.flights.cache.aerospike.AerospikeManager.getCommonUtil(AerospikeManager.java:203)
... 34 more\nCaused by: java.net.SocketTimeoutException: connect timed out\n\tat java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at com.aerospike.client.cluster.Connection.<init>(Connection.java:79)
... 38 more

Note: This wasn’t happening from quite a while but suddenly started happening without any code change or cluster change, however traffic is one thing which has increased a bit.

Could you share the code causing this exception?

The Exception is not coming regularly, the same code running quite well for some time and then suddenly it raise the exception and after 4-5 failures it again starts working smoothly.

The exception came while putting packet in Aerospike from line.

        cache.multiCREATEorUPDATE(key, bins, expiry);

Below is the Config and Code from where the exception is coming.

######## Config #########

namespace: ns_128
    set: supply-book-state
    hostAndPort: m_flt_aspk-rms_1.mmt.mmt:3000,m_flt_aspk-rms_2.mmt.mmt:3000,m_flt_aspk-rms_3.mmt.mmt:3000,m_flt_aspk-rms_4.mmt.mmt:3000,m_flt_aspk-rms_5.mmt.mmt:3000
    readPolicySocketTimeout: 3000
    readPolicyTotalTimeout: 10000
    readPolicySleepBetweenRetries: 10
    writePolicySocketTimeout: 3000
    writePolicyTotalTimeout: 10000
    writePolicySleepBetweenRetries: 10
    expiryInSeconds: 7200
    fallbackEnabled: false
    writePolicyMaxRetries: 2

######## Code #########

@Component
public class AerospikeBookStatusManager {

    private Cache cache;

    private static final String SUPPLIER_BIN = "supplier";

    @Autowired
    private AerospikeProps aerospikeProps;

    @PostConstruct
    public void init() {
        try {
            this.cache = CacheManager.getInstance(createAerospikeConfig(), CacheType.AEROSPIKE);
        }
        catch (Exception e){
            MMTLogger.error("CACHE-ERROR", "Aerospike init failed", AerospikeBookStatusManager.class.getName(),
                    e);
            throw new CacheException("Aerospike Initialization failed", e);
        }
    }

    private CacheConfig createAerospikeConfig() {
        AerospikeConfig aerospikeConfig = new AerospikeConfig();
        aerospikeConfig.setHostAndPort(aerospikeProps.getHostAndPort());
        aerospikeConfig.setNamespace(aerospikeProps.getNamespace());
        aerospikeConfig.setSet(aerospikeProps.getSet());
        aerospikeConfig.setReadPolicySleepBetweenRetries(aerospikeProps.getReadPolicySleepBetweenRetries());
        aerospikeConfig.setReadPolicySocketTimeout(aerospikeProps.getReadPolicySocketTimeout());
        aerospikeConfig.setReadPolicyTotalTimeout(aerospikeProps.getReadPolicyTotalTimeout());
        aerospikeConfig.setWritePolicySleepBetweenRetries(aerospikeProps.getWritePolicySleepBetweenRetries());
        aerospikeConfig.setWritePolicySocketTimeout(aerospikeProps.getWritePolicySocketTimeout());
        aerospikeConfig.setWritePolicyTotalTimeout(aerospikeProps.getWritePolicyTotalTimeout());
        aerospikeConfig.setFallbackEnabled(aerospikeProps.getFallbackEnabled());
        aerospikeConfig.setWritePolicyMaxRetries(aerospikeProps.getWritePolicyMaxRetries());
        aerospikeConfig.setEnableCompression(true);
        aerospikeConfig.setType(CompressionType.LZ4);
        return aerospikeConfig;
    }

    public void put(String key, String value, String binName, int expiry) throws CacheException {
        Map<String, byte[]> bins = new HashMap<>();
        bins.put(binName, value.getBytes());
        bins.put(SUPPLIER_BIN, CommonConstants.SERVICE_NAME.getBytes());
        cache.multiCREATEorUPDATE(key, bins, expiry);
    }

And just to add to above - we are not just seeing errors while create call, but UPDATE, GET also timing out

Could you also run this command and provide the output from each server running Aerospike DB.

cat /proc/$(pidof asd)/limits

Sorry, missed this part of the exception:

Caused by: java.net.SocketTimeoutException: connect timed out

In this particular case, the error was caused by the connection timing out while trying to reach the server. There are many reasons for timeouts, latency between client and server, client or server overloaded, network problems, etc.

© 2015 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.