Aerospike Increased write Latency with I3 series EC2 Boxes with nvme disks


#1

We have a 8 node cluster running in DRAM+DISK mode (I2.xlarge) instances .

Was giving good performance with 99% calls within 1ms

Yesterday we replaced 1 node with I3.xlarge and cpu wait time was even less on the new box.

Today we replaced two more boxes with I3.xlarge and since then we have seen a degradation in write performance on the all nodes (old and new). 50% writes have gone beyond 1ms upto 4-8 ms bucket


#2
  1. What version of Aerospike are you running?
  2. Please share your configuration.

#3

We are using a patch build provided over version 3.9.1 ( 3.9.1-158-g1e8db6e)

Config :

service {
        user root
        group root
        paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
        pidfile /var/run/aerospike/asd.pid

        ## SET TO NUMBER OF CORES ##
        service-threads 4
        transaction-queues 4
        scan-threads 4
        ###########################

        ## DONT CHANGE ##
        transaction-threads-per-queue 3
        proto-fd-idle-ms 600000
        proto-fd-max 100000
        batch-max-requests 10000
        migrate-threads 2
        replication-fire-and-forget true
        ##########################
}

logging {
        file /var/log/aerospike/aerospike.log {
                context any info
        }
}

network {
        service {
                address any
                port 3000
        }

        heartbeat {
                mode mesh
                port 3002

                mesh-seed-address-port 10.0.23.154 3002
                mesh-seed-address-port 10.0.23.95 3002
                mesh-seed-address-port 10.0.23.94 3002
                mesh-seed-address-port 10.0.23.89 3002
                mesh-seed-address-port 10.0.23.190 3002
                mesh-seed-address-port 10.0.23.164 3002
                mesh-seed-address-port 10.0.23.144 3002
                mesh-seed-address-port 10.0.23.219 3002

                interval 150
                timeout 20
        }

        fabric {
                port 3001
        }

        info {
                port 3003
        }
}

namespace userdata {
        replication-factor 2
        #### CHANGE FOR INSTANCE ###
        memory-size 25G
        ############################
        default-ttl 0 # 30 days, use 0 to never expire/evict.
        storage-engine device {
                ## COLD START AND NO SHADOW DEVICE ##
                cold-start-empty true
                device /dev/nvme0n1
                #####################################
                ### 1MB FOR INSTANCE STORE ###
                write-block-size 1024K
                #############################
                max-write-cache 1024M
        }
#       storage-engine memory
}
namespace user_config_data {
        replication-factor 2
        memory-size 1G
        default-ttl 0
        storage-engine device {
                cold-start-empty true
                file /dev/aerospike/user_config_data.dat
                filesize 1G
                write-block-size 1024K
        }

#4

There seems to be a missing closing brace at the end of your config, is this a typo?


#5

The replication-fire-and-forget configuration was deprecated in 3.9.0. Are all nodes (new and old) running 3.9.1?

Assuming the older nodes were are using fire-and-forget, this could explain the increase in observed write latency. The closest configuration to fire-and-forget is write-commit-level-override master which would need to be added to your namespaces.


#6

All nodes are on 3.9.1 since last 4 months. I guess the reason for latency increase in other nodes could be slowness in replication on the new nodes but that will be only if a client write waits for replication as well which is not set in our client app


#7

The default behaviour is that the server waits for replication before acking the client. To clarify, are you saying that your clients specify write-commit-level master for each transaction?


#8

Also there are several microbenchmarks that can assist in pinning down the exact source of latency:

See http://www.aerospike.com/docs/operations/monitor/latency.


#9

The performance was restored to normal after replacing all I3.xlarge back with I2.xlarge nodes. So this performance degradation has to do something with I3 box or configuring I3 box. Do we need any kind of pre warming of I3 disks ? Or could it be that new disk need the latest linux kernel ? Current version is : [ec2-user@ip-10-0-23-94 backup]$ uname -a Linux ip-10-0-23-94 4.4.14-24.50.amzn1.x86_64 #1 SMP Fri Jun 24 19:56:04 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


#10

We recently published benchmarks for these instance types: