Getting Aerospike timeout when posting 1M records

I am getting AerospikeException$Timeout: Client timeout: timeout=2000 iterations=1 failedNodes=0 failedConns=0 when I am trying to post 1M records in aerospike.

Following is the stack trace: com.aerospike.client.AerospikeException$Timeout: Client timeout: timeout=2000 iterations=1 failedNodes=0 failedConns=0 at com.aerospike.client.command.SyncCommand.execute(SyncCommand.java:131) at com.aerospike.client.AerospikeClient.put(AerospikeClient.java:295)

and this is my aerospike configuration file:

Aerospike database configuration file.

service { user root group root paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1. pidfile /var/run/aerospike/asd.pid service-threads 4 transaction-queues 4 transaction-threads-per-queue 4 proto-fd-max 15000 }

logging { # Log file must be an absolute path. file /var/log/aerospike/aerospike.log { context any info } }

network { service { address any port 3000 }

    heartbeat {
            mode multicast
            address 239.1.99.222
            port 9918

                        interval 150
            timeout 10
    }

    fabric {
            port 3001
    }

    info {
            port 3003
    }

}

namespace dfm { replication-factor 2 memory-size 2G default-ttl 30d # 30 days, use 0 to never expire/evict.

    storage-engine memory
    storage-engine device {
            file /opt/aerospike/data/dfm.dat
            filesize 3G
            data-in-memory true # Store data in memory in addition to file.
    }

}

There is no other replica; only 1 aerospike is running. I have checked aerospike logs as well and there is no warning. I can post logs as well.

Hi

I think you have run out of space and the database has stopped writes, hence the timeout.

  • Are there messages in the log like stop writes = true? see this post
  • How big is an individual record?

Peter

  1. There is no message in logs which say stop writes = true.
  2. Link to the post is broken. Could you please correct it.
  3. Each individual records is of few bytes. Maximum length would be 800.
  4. We are getting this exception occasionally not every time. But we were writing large number of records every time. I have added records in the past successfully.

Hi,

Without seeing your exact hardware and software set up I can only hypothesize on your problem based on the symptoms you have outlined. I’m guessing that the cluster is undersized for your peak throughput workload.

One a single node cluster, all requests pass through the same network layer; this could be the bottle neck. Have you tried distributing the network IRQs across all CPU cores?

Are multiple “clients” writing to the single server concurrently? With a single node cluster there is not partitioning and therefore limited parallelism. Can you start another server and have it join the cluster?

You should expand the storage-engine filesize to 10G. A rule of thumb for this is 5x the memory-size.

Are your writes to the same 1million records, or to an additional 1million records?

Have you run the benchmark tool that comes with the Java client, to simulate an equivalent load?

Peter