What Do I Do When an EBS Shadow Device Fails?

FAQ - What Do I Do When an EBS Shadow Device Fails?

Detail

It is a common configuration for nodes in AWS to store data in memory or in attached ephemeral storage, with an EBS drive as a shadow device for data persistence between instances. However, EBS drives can and do fail. What can be done? Are EBS snapshots recommended for protection against this?

Answer

When an EBS drive fails, it is similar to disk failure in physical machines. The cluster should not lose any data from a single EBS failure, assuming a replication factor of 2. When a new node with empty storage is added to the cluster, it will repopulate with data through migration.

The use of snapshots is optional. When an up-to-date EBS snapshot is attached back to an instance, there should be no data loss. This may or may not be faster than filling up data through migration, depending your instance setup. Aerospike will perform duplicate resolution to resolve conflicts. The use of snapshots may help reduce data loss in case of multiple EBS volumes failures.

If you are concerned about multiple storage devices failures, you can also increase the replication factor, depending on your requirement.

Whether to zeroize your instance’s primary (ephemeral) storage when the EBS shadow device fails depends on your requirements.

When you attach an EBS volume snapshot to a new node, it will likely perform a cold start (assuming instances terminated with ephemeral disk destroyed). If Aerospike is able to start successfully, you should be good to go. On the other hand, if the primary (ephemeral) volume is corrupted, then you will not able to start and will need to zeroize it.

Remember cold starting is slow, so it may be faster to zeroize the disk and let it fill by migration, but the only way to determine this for a particular cluster is to test it. For more details, see our KB article Which is faster to complete migration - restarting an empty node vs restarting a node with data.

If you are not using durable deletes and are concerned about the possiblity of deleted records being “resurrected”, then you will need to zeroize the primary storage and attach an empty EBS volume.

Keywords

EBS AWS FAILURE SNAPSHOT COLD START DURABLE DELETE

Timestamp

April 2020

© 2015 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.