Multiple performance problems

Hello,

We are trying to use Aerospike in our production environment. We setup 18 Aerospike nodes turn on all writes we usually have and turn on part of our read requests.

Here there is some graphs related to this experiment:

On the first graph you can see when we increase read load. We keep this for more than 3 hours. During the test cluster (18 nodes) was under both write load (~34000 ops/sec, all values on the graphs are per minute) and read load (~13500 ops/sec, but our final goal is 416000 req/sec).

What we see:

  • 3 times Aerospike cluster stop responding with 100% of timeouts
  • we have 2 nodes with constant high CPU load (even with low reading frequency), we were trying to re-create them from scratch, just copy disk data - all the same. We see that GC alway work on this nodes and eating 90% of CPU (most of it in iowait)
  • while all nodes were under the same load we can see significant difference in CPU usage from 25% to 95% depends on node

Our setup is on AWS. We use m3.2xlarge instances with shadow devices functionality.

Any help?

Few Quick questions

  • Can you share the conf file (need to know namespace config)
  • What load are you runing simple read/write/UDF/query/LDT ? (to get and idea what system might be doing)
  • How large are the records generally and how many keys ?
  • Can you share log snippet in the window where this happened ? (To see if there are any bg maintainance task is showing up)
  • How does system health look like at the time of timeouts (iostat / top / iftop et al) (Amazon !!!)

– R

Config file:

service {
    user aerospike
    group disk
    paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
    pidfile /var/run/aerospike/asd.pid
    service-threads 8
    transaction-queues 8
    transaction-threads-per-queue 4
}

logging {
    # Log file must be an absolute path.
    file /var/log/aerospike/aerospike.log {
        context any info
    }
}

network {
    service {
        address any
        port 3000
    }

    heartbeat {
       mode mesh
       port 3002

       mesh-seed-address-port 10.2.8.174 3002   # aerospike-aws-ca-1
       mesh-seed-address-port 10.2.8.91  3002   # aerospike-aws-ca-2
       mesh-seed-address-port 10.2.8.153 3002   # aerospike-aws-ca-3
       mesh-seed-address-port 10.2.8.134 3002   # aerospike-aws-ca-4
       mesh-seed-address-port 10.2.8.244 3002   # aerospike-aws-ca-5
       mesh-seed-address-port 10.2.8.18  3002   # aerospike-aws-ca-6
       mesh-seed-address-port 10.2.8.8   3002   # aerospike-aws-ca-7
       mesh-seed-address-port 10.2.8.5   3002   # aerospike-aws-ca-8
       mesh-seed-address-port 10.2.8.9   3002   # aerospike-aws-ca-9
       mesh-seed-address-port 10.2.8.248 3002   # aerospike-aws-ca-10
       mesh-seed-address-port 10.2.8.109 3002   # aerospike-aws-ca-11
       mesh-seed-address-port 10.2.8.162 3002   # aerospike-aws-ca-12
       mesh-seed-address-port 10.2.8.52  3002   # aerospike-aws-ca-13
       mesh-seed-address-port 10.2.8.25  3002   # aerospike-aws-ca-14
       mesh-seed-address-port 10.2.8.69  3002   # aerospike-aws-ca-15
       mesh-seed-address-port 10.2.8.27  3002   # aerospike-aws-ca-16
       mesh-seed-address-port 10.2.8.200 3002   # aerospike-aws-ca-17
       mesh-seed-address-port 10.2.8.210 3002   # aerospike-aws-ca-18

       interval 150
       timeout 10
    }

    fabric {
        port 3001
    }

    info {
        port 3003
    }
}

namespace !hidden1! {
    replication-factor 1
    memory-size 24G
    default-ttl 0 # never expire/evict.

    high-water-memory-pct 85
    high-water-disk-pct 85
    stop-writes-pct 85

    storage-engine device {
        device /dev/sdb /dev/xvdf
        device /dev/sdc /dev/xvdg
        write-block-size 1M
        defrag-lwm-pct 85
    }

    set !hidden2! {
    }
    set !hidden3! {
    }
    set !hidden4! {
    }
    set !hidden5! {
    }
    set !hidden6! {
    }
    set !hidden7! {
    }
    set !hidden8! {
    }
    set !hidden9! {
    }
    set !hidden10! {
    }
    set !hidden11! {
    }
    set !hidden12! {
    }
}

We do only simple read and write (create/update) operations. Number of queries you can find on the graphs or in the first message.

Main set contains 1,8 bln object with 300 bytes size in average. And most read/write operations related to it.

Not sure what exact data you want from logs. But may be it would be helpfull to know that we starting get 100% timeouts each time when aerospike process was killed by OOM killer. We have 30GB memory on boxes and as you see from config we set memory-size into 24G.

If you can list what exact metrics you want to know I can prepare graphs for you.

Now we are thinking to try to use c3.* type of instances. First attempt shows that they works better than m3.* instances.

This is too high, we typically recommend using 50%, this setting has a non linear write amplification. The write amplification can be plotted as 1/(1-n/100) for n = 0 to 100.

Hi,

Just to mention how defrag-lwm-pct affects write performance. 7-nodes cluster, ~50K+ w/s. Changing defrag-lwm-pct from 75 to 80 gives 3x write latency increase.

Regards, Alexander

2 Likes