Documentation on recommended setup for SSD Storage Engine on Amazon EC2?


#1

Hi,

I saw the 1M TPS on Amazon EC2 blog post on HighScalability. Great stuff! I was wondering if there was a similar write-up for best configuration for EC2 with the SSD storage engine.

I’m actively evaluating Aerospike and am doing robustness and performance testing. I’m doing so on Amazon. I set up a cluster of 12 c3.4xlarge instances on a VPC with mesh networking, and capped out at 100K TPS with 30-byte keys and 1,000 byte binsets. I realized that I had saturated the network. I now see that there are key configurations that have to be used for a fair evaluation: enhanced networking, receive packet steering, etc.

I’m giving it a second go. I want to test on a cluster of at least 10 with a replication factor of 3 using the SSD storage engine. Any key recommendations? If using SSD, should I default to a c-series instance with enhanced networking? Is there a point per node where having more cores just doesn’t matter? How about EBS? Will that be fine or should I go with an instance store volume? Anything else I should be considering?

Thanks! Alyssa NodePrime


Aerospike performance with node.js driver (on AWS c3.2xlarge)
#2

Alyssa,

The Direct attach SSD in AWS are ephemeral in nature. That is durability span of persistence on it is duration of instance and the data will be gone if the instances are restarted. Newer instances and also higher instances (4x and above) generally are pretty reliable. But it cannot be trusted for the situations like availability zone going down.

So there are two models of persistence.

  1. Replicate across availability zone with Direct Attach SSD
  2. Data on EBS ssd.

Both have pros and cons of its own. Across availability zone every byte transfer is a cost. Replication traffic and in case of failure, application traffic may go over availability zone for data which could cost a lot. Though the behavior of direct attach ssd is pretty good, the EBS ssd has iops restriction and much poor latency characteristic … If reads were to go directly to EBS, that would hit the millisecond SLA which may not be acceptable.

To answer you querstion, If you want to use Direct attach SSD, i2 instances are best in our experience. The drives are much better quality and have better latency characteristic. You may want to pick i2.2xlarge or i2.4xlarge for experiments based on throughput you are looking for. As indicate above if you are using Direct attach cross zone replication should be considered. Aerospike support RackAware replication http://www.aerospike.com/docs/operations/configure/network/rack-aware/. Each zone can be seen as a Rack and cluster can be setup.

EBS is good option if your data is not huge and you can run with Data-in-memory setup (All data in memory). Reason being EBS may not be able to serve reads directly from the device. Purely depends on what is your data size and throughput requirement.

Here is indepedent vendor analysis comparing Amazon SSD performance with Bare metal http://www.internap.com/resources/nosql-db-bare-metal-vs-public-cloud/. We as well be publishing some more detail after we done our inhouse benchmarks.

There is very good blog post by Hatim http://hatim.eu/ on various ways Ephemeral SSD can be used, you can check that out for options.

– Raj


#3

Thanks Raj!

This is extremely helpful. Our use case is focused on write performance - specifically the ingestion of time series. We honestly don’t have a good understanding of what our read load would be.

There’s two configurations we would want to offer - cloud and on-premise. Cloud would most likely use EC2 - we haven’t figured that out yet. On-premise is something we’d like to provide an easy installation path over VM, though I’m sure our largest customers who require the highest performance would be open to us configuring bare metal for them.

It sounds like based on these configuration requirements, using the HDD storage engine would be best. What type of read and write performance hit is suffered by using that engine over using SSD?

Thanks, Alyssa


#4

Aerospike with persistence can be configured to operate in two modes

  • Data in Memory
  • Data Not in memory

Either entire data stays in memory or not. So if you are running with Data Not In Memory setup that would mean your read request is going to be random I/O on drive. SSD can take that but HDD cannot. We do not recommend HDD with Data NOT in memory.

As far as write goes Aerospike does bufferred writes. That is when write is performed it is committed into the buffer in memory on both master and replica and flushed asynchronously to the storage. So there is only one I/O per buffer.

If your worry is write workload you could try with HDD storage engine with Data in memory setup which is a viable option based on your throughput requirement. Please note if you are using filesystem ext4 performs better than ext3.

– Raj