How to troubleshoot/fix "Device overload"?


#1

In my development environment that use EC2 ephemeral disk+EBS software RAID1, I often get com.aerospike.client.AerospikeException: Error Code 18: Device overload. afaik, it is a server side status not specific to Java client. What exactly does it mean? how to troubleshoot a “device overload”. The server looks normal to me.

btw, if the hardware is not fast enough, I’m expecting a increase in latency rather than cause exception.

Reference: http://hatim.eu/2014/05/25/leveraging-ssd-ephemeral-disks-in-ec2-part-2/


A write fail warning
#2

Aerospike does not synchronously flush swb (storage write blocks) to disk. These blocks will be flushed when full (or based on some other tuning parameter, but let’s keep those aside for now). Write transactions do get committed to memory on master and replica(s) before returning to clients though.

When a storage device is not keeping up, Aerospike uses cache configured through (max-write-cache) and will try to keep up until a certain point (when this cache is full) and will then throw those device overload error. Therefore you may not see as much direct latency impact.

You can dynamically increase this cache from the default (64M) to a higher multiple of the write-block-size (which is by default 128KB for SSD devices).

For example:

asinfo -v 'set-config:context=namespace;id=test;max-write-cache=128M'

This will increase the number of swb in cache from 512 to 1024 (assuming 128KB block size).

Some links with a tiny bit of info on those config parameters:

http://www.aerospike.com/docs/reference/configuration#max-write-cache http://www.aerospike.com/docs/reference/configuration/#write-block-size


#3

One more thing. You can check the following stat (w-q) in the logs to see how many of those cache swb are used:

device /dev/sdc: used 296160983424, contig-free 110637M (885103 wblocks), swb-free 16, w-q 0 w-tot 12659541 (43.3/s), defrag-q 0 defrag-tot 11936852 (39.1/s)

Details at: http://www.aerospike.com/docs/reference/serverlogmessages/


#4

thank for your info. i’ve been testing for the past two days to do bulk data loading programmatically. My original instance is not optimized for IO and perform really bad, and I switched to another instance that improves a lot, but I can still easily overload the device when I start more threads to write data concurrently.

It seems the max-write-cache is useful for handling burst traffic, but as long as number write request is sustainably more than “hardware” throughput, the “w-q” will reach the max over time. WARNING (drv_ssd): (drv_ssd.c::4260) {NAMESPACE} write fail: queue too deep: q 513, max 512 WARNING (drv_ssd): (drv_ssd.c::4260) {NAMESPACE} write fail: queue too deep: q 10242, max 10240 So far, for a single AS instance, I’m able to get a sustainable throughput for a few hours of disk write at 26-28MB/s from iotop ( in CloudWatch, it is slightly higher than 60MB/s) or 500 tps in a EC2 M3.XLarge EBS-Optimized instance with a 40GB/1200 IOPS EBS. (without any software RAID) The instance in theory supports 62.5MB/s disk write. My Java program just read from an old AS instance and load data to this new instance, when it use more than 1 thread to write, it basically always overload the device.

I suppose any database has a limit. One difference in compare to traditional RDBMS (or actually some other NoSQL) Aerospike doesn’t use a on-disk journal/write-ahead-log to accept write request. so the app design should detect an overload or maybe wait progressively between retries when an “device overload” error is encountered. I wonder what is the best practice to avoid “device overload” while trying to optimize bulk requests.

w-q is clear but swb-free is a little bit confusing to me. When it is about to overload, w-q + swb-free <= max-write-cache/block size)`. But when it doesn't reach the peak loading,swb-freeseems to have not allocated and remain at a low level. I somehow expectswb-free`` at a high level even when it is idle.

thx.


#5

hi,

We are working on two prong approach to this problem in cloud, specifically AWS.

  1. is to add a specific stat for the write queue which will help in better monitoring and pacing the writes as per the device capability.

  2. is a new solution using ephemeral and EBS devices which should be better than RAID performance. We expect to release the same within few weeks. Right now its in final phases of docs preparation. Will let you know once its ready.

I will update more on this topic with respect to our AWS benchmarking experience in EBS and ephemerals and how to best work around this in some time.


#6

Anything happen with this?

I’m evaluating Aerospike for use, and we’re pre-populating it with data under a write heavy load. All works well at first, with a steady write throughput of about 3000 TPS (AMC) and about 70M/s (iotop). However, when we reach about 300K replicated objects in the namespace it starts pushing back with overload and writes plummet to about 6M/s and 200 TPS.

Tried a bunch of different EC2 types, and they all respond in the same way. We are using ESB SSD raw device.

Any advice on how to overcome this?


#7

Hi,

Yes, we now support using the attached ephemeral SSD as cache device and EBS as backing device using bcache.

http://www.aerospike.com/docs/operations/plan/ssd/bcache/

Let me know if you need more info/help on this.


#8

Thanks!

We tried this and it’s definitely more stable. There’s no noticeable spike and crash anymore, however the write throughput seems to peak at around 1000 TPS per node. We’ve followed all the recommendations laid out in documentation (we can’t use a VPC yet due to other factors but network performance doesn’t seem to be a major issue).

We’re significantly short of the 10K TPS write performance and not sure what to do about it. We’re using i2.4xlarge instances. Anything to do beyond what’s recommended in the article?


#9

Are you using HVM based images or PV images? Would suggest using HVM images.

There’s no noticeable spike and crash anymore, however the write throughput seems to peak at around 1000 TPS per node

What is the object size you are using? What is the workload that you are using testing?

We have not tested EC2 Classic setup for performance. Is it possible for you to share a collectinfo dump of your server node while a load is running? It would help us to debug your environment and help us to set a reproduction environment as required.

I would suggest using the latest aerospike-admin from collectinfo2 branch over here…

You would then run it by

sudo python asadm.py -e collectinfo

The data collected is visible in the log, so you could review it before sending to us.

https://github.com/aerospike/aerospike-admin/blob/collectinfo2/lib/controller.py#L628-L672

You could send it to me on my username in this forum @ aerospike . com.

We would see how to put up a better mechanism for collecting such logs from our users here.


#10

Hi,

I am facing issue of AEOSPIKE_ERR_DEVICE_OVERLOAD. I have only one node in cluster and using C client benchmark.
In configuration, am using raw SSD device and write-block-size is set to 128k (default) and max-write-cache to 128M. To run C client benchmark, following cnfiguration is used - ./target/benchmark -h 127.0.0.1 -p 3000 -n test -k 1000000 -b 1 -o S:4096 -w I -z 8. But it gives the error of device overload and after checking aerospike logs gets messages - WARNING (drv_ssd): (drv_ssd.c::4260) {NAMESPACE} write fail: queue too deep: q 513, max 512

is there any configuration parameter am I missing? I wanted to verify some perforamnce numbers with Aerospike, is there any other way to do it?


#11

It means that your drive is not keeping up with your load. Different SSDs have different capacity, and you’ve found yours. Read the capacity planning section, especially the parts about drives. If you can’t find your drive in the list published, you’ll need to run your own ACT test on it.


#12

2 posts were split to a new topic: Aerospike not starting with this config - no error in log


#13

A post was merged into an existing topic: Aerospike not starting with this config - no error in log