FAQ - Shadow Device


#1

FAQ - Shadow Device on AWS

Synopsis

This knowledge-base covers some of the frequently asked questions specific to shadow device configuration in Aerospike for AWS deployments but can be easily generalized to other environments.

1. When do I use a shadow device?

Some AWS EC2 instance types have direct-attached SSDs called Instance Store Volumes, also known as ephemeral drives/volumes. These can be significantly faster than EBS volumes (as EBS volumes are network attached). Nevertheless, it is recommended by AWS to not rely on instance store volumes for valuable, long-term data, as these volumes are purged when the instance stops. To take advantage of the fast direct-attached instance store SSDs while being protected against the unexpected shutdown of an instance, Aerospike allows the configuration of shadow devices where all writes are also propagated to. This is configured by specifying an additional device name in the storage engine stanza of the namespace configuration.

Refer to the shadow device configuration page for more details.

2. How do I configure shadow device?

You would need 1 shadow device per primary device:

storage-engine device {
	device /dev/sdb /dev/sdf
	device /dev/sdc /dev/sdg
}

3. What happens when the EC2 instance with shadow device (EBS) fails/terminates?

Unlike the data stored on a local instance store (ephemeral/attached storage), which persists only as long as that instance is alive, data stored on an Amazon EBS volume can persist independently of the life of the instance. Therefore, Aerospike recommends using the local instance store only for temporary data. For data requiring a higher level of durability, Aerospike recommends using Amazon EBS volumes. If using an Amazon EBS volume as a root partition, the Delete on termination flag should be set to “No” in order for the Amazon EBS volume to persist outside the life of the instance.

4. If we are using a shadow device, and let us assume that the instance has stopped and started, then how is data from shadow device restored?

In case of an instance shutdown and restart, a new instance will actually be spawned with empty ephemeral storage. If configured with a shadow device, Aerospike will restore the data from the shadow device. This will cause the node to go through a cold-restart which will rebuild the primary index as well as populate the attached (ephemeral) device from the shadow device. This will likely be slower than a regular cold-start given the extra read and writes necessary to populate the ephemeral devices from the shadow device.

5. If we are using an in-memory namespace (data-in-memory true), with file-backed device pointing to EBS, how do I configure to use a shadow device?

Such configuration wouldn’t be necessary in such case as the file would already be persisted on an EBS volume, which would survive instance shut downs and restarts. In general, though, for data-in-memory use cases with persistence, it may be advantageous to use a direct-attached ephemeral device alongside a shadow EBS volume. This would save on IOPS cost incurred on large block reads necessary for the defragmentation process. The large block reads would be performed against the direct attached volume (rather than against the EBS volume over the network) and defragmented blocks would be written to both ephemeral and EBS volumes.

6. Would using EBS only as a storage device introduce latencies?

Depending on the nature of the workload, using EBS as the primary storage without specifying data in memory can evidently potentially impact the read and write latencies as those would have to travel over the network to the EBS volume.

7. Would using a shadow device affect read / write latencies?

In general, using a shadow device should not have a noticeable impact on read and write latencies, as the shadow device is not in the direct path of read and write transactions. Having said that, if the shadow device is not able to keep up for the large blocks being written to it, breaching the configured max-write-cache, it would reject client writes for the namespace.

8. What would be the instance recovery steps?

Refer to the instance failure page of the AWS deployment guide.

If the ephemeral device is damaged (missing header information for example) and there is a valid shadow device, the server will load data from the EBS shadow device into the ephemeral disk and into memory (primary index, secondary index, data-in-memory).

9. Would reads fail or succeed when only the ephemeral device fails and the instance comes back empty but with an EBS volume as a shadow device?

The instance will restart and populate the data from the shadow device and will then server read transactions as usual once the node has rejoined the cluster.

10. Would writes fail or succeed when only the ephemeral device fails and instance comes back empty but with an EBS volume as shadow device?

This is similar to the previous point. The instance will repopulate its ephemeral device upon restart and serve write transactions as usual once the node has rejoined the cluster.

11. How do I disable usage of a shadow device?

To disable a shadow device, you would need to re-configure it (remove it from the configuration) and restart Aerospike service in a rolling manner across the cluster.

12. Is writing to EBS (shadow device) asynchronous or synchronous?

Writes to an EBS shadow device are done the same was as for the primary configured device. The writes are done asynchronously by default (a special mode will be introduced allowing to configure Aerospike to synchronously flush data to device).

13. Do we have individual streaming write buffer for shadow device and instance store?

The streaming write buffers are individual for each local disks. But there will be two queues, one each for local device and shadow device. Say for example we have device sdb and its shadow sdf. When the stream write buffer gets filled for sdb, it is put into the queue and waits for a thread to write the buffer content to the disk. When writing to disk sdb is finished, the same write-buffer is put into another queue which is for shadow disk sdf. And meanwhile, a new write-buffer is flushing it’s content to sdb.

Thus, the same buffers are put into two queues (one for each device primary and shadow) one after the other.

By default we wait 1 seconds after a Streaming Write Buffer (SWB) has been allocated before flushing the write-buffer to disk. This can be tuned using the flush-max-ms configuration parameter. In addition to this, if the disk is not keeping up with the write load, there may be multiple SWBs pending to be written, The amount of data that can be pending is configurable through the [max-write-cache]((https://www.aerospike.com/docs/reference/configuration#max-write-cache) configuration option, which by default is 64 MB.

14. How do I monitor the write-buffer and queues for the two devices?

The following log line will help confirm if the primary device (and the shadow device) is keeping up.

Nov 02 2017 07:39:08 GMT: INFO (drv_ssd): (drv_ssd.c:2095) {test} /dev/sdb4: used-bytes 382081920 free-wblocks 1514 write-q 0 write (0,0.0) defrag-q 0 defrag-read (1,0.0) defrag-write (0,0.0) shadow-write-q 0

For details on the individual fields, refer to the server log reference manual.

To troubleshoot device not keeping up issues, refer to the following knowledge base article.

15. Is there any change in RAM requirement for shadow device configuration?

There is no special RAM requirement as such for shadow device configuration.

Notes

Keywords

ephemeral aws shadow

Timestamp

01/18/2018