Do data persist on Instance store when rebooting an instance on Amazon EC2?


#1

hello

I am new to aerospike and experimenting with it.

I am running Aerospike on EC2. I used the following link http://www.aerospike.com/docs/deploy_guides/aws/install/ I picked an i2.4xlarge. I am using the 4x800 SSD as ephemeral SSD Instance stores (NOT EBS backed) because I am assuming read and write will be faster.

After some writes and reads I did a reboot of the machine and to my surprise the data were still there.

However I was under the impression the data on the ephemeral disk would get lost when stopping or rebooting the instance.

Is that expected behavior?

Is the AMI found here: http://www.aerospike.com/docs/deploy_guides/aws/install/ an instance store-backed AMI or an Amazon EBS-backed AMI

Thank you for your help


#2

You are not guaranteed to get the same instance back. You are probably more likely to keep your data if the instance was only briefly down but I wouldn’t depend on this. But if you were to stop the instance for a week then the data would probably be gone.


#3

Ha. Good to know. Thank you so much.

May I ask a few additional questions.

  1. Please can you confirm that the AMI found here: http://www.aerospike.com/docs/deploy_guides/aws/install/ is EBS backed?

  2. I am debating between the following use cases:

case a. I would not use EBS at all. I would use the ephemeral Instance Stores. I would spawn multiple machines with replication factor to protect against a machine going down and data loss.

case b. Use the bcache https://www.aerospike.com/docs/operations/plan/ssd/bcache/. But it seems there is still a pending bug.

case c. Turn the Instance Stores into EBS. Data would be persisted so I wouldn’t have to have a replication factor if i can afford a down time. But I am concerned about read latency.

Please could you provide some thoughts on case a. vs. case b. vs. case c. ?

Thank you


#4

I think AMIs are EBS agnostic.

You do run the risk of losing data if multiple nodes fail. This exists on bare-metal but on bare-metal the are more situations where the data can still be recovered but would be lost on ephemeral.

Yes, bcache as been problematic for us. Aerospike 3.5.10 I believe is expected to be out next week will include a new feature for defining a shadow disk. Basically writes will go to the primary disk and the shadow disk, reads will only go to the primary. So stay tuned :smiley:.

Depends on how often EBS volumes fail. Also means that if a node were to fail, your cluster will be missing data till you are able to replace it with another node with the appropriate EBS volume.


#5

Thank you so much again.

Re. case c) do you have an idea of how much slower would it be to read from EBS Volumes as opposed to Ephemeral instance stores ?


#6

Instance Store will persist data during a reboot of the instance.

It loses data under these conditions:

  1. Terminating instance
  2. Stopping instance
  3. Hardware/Drive failure

More info here: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html

Hardware doesn’t reset unless the instance is stopped or terminated, a reboot is considered a recycle at the VM level so the underlying hardware remains.


#7

Depending on your load, EBS can have severe performance issues.

If you want to use EBS for durability, your instances should be the largest available, with the fastest network performance, EBS Optimized enabled and Provisioned IOPS to guarantee baseline performance. You can do RAID striping across several volumes or use several devices under a single namespace to distribute the load some more.

In almost every situation, if you’re using high-end instances, the local instance storage will outperform by a wide margin. The most common use case with distributed databases is to rely on the distributed nature of replication to shield against data loss of a single instance.

More database nodes will give you high availability and better performance as you scale with more clients and transactions. You can always increase the replication factor to deal with multiple node failures. Multiple nodes are also the only way to get safety of in-memory data if you have it.


#8

kporter, manigandham, Thank you so much for the very useful feedback!


#9

@mlabour and @manigandham,

We did not release 3.5.10 as @kporter alluded to, but we did release server release 3.5.12 on May 28, which adds device shadowing functionality for persistence on network devices in addition to ephemeral storage (KVS- AER-3557). The full release notes for the 3.5.12 Aerospike server community edition are here.

Enjoy!


#10

Hi @Mnemaudsyne,

Thank you for the notification. I am having some issues compiling 3.5.12

I posted here:

Even though git submodule update --init fails, I still went ahead and ran make

make -C /home/ec2-user/aerospike-server/modules/common CF=/home/ec2-user/aerospike-server/cf EXT_CFLAGS="-DMEM_COUNT -DENHANCED_ALLOC"
make[1]: Entering directory `/home/ec2-user/aerospike-server/modules/common'
make[1]: *** No targets specified and no makefile found.  Stop.
make[1]: Leaving directory `/home/ec2-user/aerospike-server/modules/common'
make: *** [all] Error 2

Thank you!


#11

@mlabour,

Thanks! We will follow up with you about this issue in the new thread you opened.

Regards,

Maud