FAQ - Shadow Device

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

FAQ - Shadow Device on AWS

Synopsis

This knowledge-base covers some of the frequently asked questions specific to shadow device configuration in Aerospike for AWS deployments but can be easily generalized to other environments.

1. When do I use a shadow device?

Some AWS EC2 instance types have direct-attached SSDs called Instance Store Volumes, also known as ephemeral drives/volumes. These can be significantly faster than EBS volumes (as EBS volumes are network attached). Nevertheless, it is recommended by AWS to not rely on instance store volumes for valuable, long-term data, as these volumes are purged when the instance stops. To take advantage of the fast direct-attached instance store SSDs while being protected against the unexpected shutdown of an instance, Aerospike allows the configuration of shadow devices where all writes are also propagated to. This is configured by specifying an additional device name in the storage engine stanza of the namespace configuration.

Refer to the shadow device configuration page for more details.

2. How do I configure shadow device?

You would need 1 shadow device per primary device:

storage-engine device {
	device /dev/sdb /dev/sdf
	device /dev/sdc /dev/sdg
}

Note it is very important, when using a shadow device, that the node id, of the node it will be attached to does not change. This is because the position of that node in the succession list for a given partition is in part a function of the node id. This in turn determines which nodes manage which partitions for both master and replica purposes. Change this for a single node and you will alter the partition distribution calculations within the cluster and superfluous migrations will occur, rather than being able to restore from the shadow device.

For bare metal hardware this is not usually a problem as the node id is a function of the NIC MAC address which will not change even if the IP address changes - which it might well if using DHCP. It is still a best practice to fix this as NICs can be replaced.

It is an issue with virtual machines, particularly in public cloud as the node id will very likely change following a stop or termination as the virtual MAC address will change, which means data from the shadow device may not be owned by the same node and will cause extra migrations across the cluster. Again, it is possible to fix the MAC address on virtual machines by fixing the virtual network adapter, but it is safer to fix the Aerospike node id.

Node id is configured under the service stanza in the configuration file

service {
      <...>
      node-id a1
      <...>
  }

3. What happens when the EC2 instance with shadow device (EBS) fails/terminates?

Unlike the data stored on a local instance store (ephemeral/attached storage), which persists only as long as that instance is alive, data stored on an Amazon EBS volume can persist independently of the life of the instance. Therefore, Aerospike recommends using the local instance store only for temporary data. For data requiring a higher level of durability, Aerospike recommends using Amazon EBS volumes. If using an Amazon EBS volume as a root partition, the Delete on termination flag should be set to “No” in order for the Amazon EBS volume to persist outside the life of the instance.

4. If we are using a shadow device, and let us assume that the instance has stopped and started, then how is data from shadow device restored?

In case of an instance shutdown and restart, a new instance will actually be spawned with empty ephemeral storage. If configured with a shadow device, Aerospike will restore the data from the shadow device. This will cause the node to go through a cold-restart which will rebuild the primary index as well as populate the attached (ephemeral) device from the shadow device. This will likely be slower than a regular cold-start given the extra read and writes necessary to populate the ephemeral devices from the shadow device.

5. If we are using an in-memory namespace (data-in-memory true), with file-backed device pointing to EBS, how do I configure to use a shadow device?

Such configuration wouldn’t be necessary in such case as the file would already be persisted on an EBS volume, which would survive instance shut downs and restarts. In general, though, for data-in-memory use cases with persistence, it may be advantageous to use a direct-attached ephemeral device alongside a shadow EBS volume. This would save on IOPS cost incurred on large block reads necessary for the defragmentation process. The large block reads would be performed against the direct attached volume (rather than against the EBS volume over the network) and defragmented blocks would be written to both ephemeral and EBS volumes.

6. Would using EBS only as a storage device introduce latencies?

Depending on the nature of the workload, using EBS as the primary storage without specifying data in memory can evidently potentially impact the read and write latencies as those would have to travel over the network to the EBS volume.

7. Would using a shadow device affect read / write latencies?

In general, using a shadow device should not have a noticeable impact on read and write latencies, as the shadow device is not in the direct path of read and write transactions. Having said that, if the shadow device is not able to keep up for the large blocks being written to it, breaching the configured [max-write-cache] (Configuration Reference | Aerospike Documentation), it would reject client writes for the namespace.

8. What would be the instance recovery steps?

Refer to the instance failure page of the AWS deployment guide.

If the ephemeral device is damaged (missing header information for example) and there is a valid shadow device, the server will load data from the EBS shadow device into the ephemeral disk and into memory (primary index, secondary index, data-in- memory).

9. Would reads fail or succeed when only the ephemeral device fails and the instance comes back empty but with an EBS volume as a shadow device?

The instance will restart and populate the data from the shadow device and will then server read transactions as usual once the node has rejoined the cluster.

10. Would writes fail or succeed when only the ephemeral device fails and instance comes back empty but with an EBS volume as shadow device?

This is similar to the previous point. The instance will repopulate its ephemeral device upon restart and serve write transactions as usual once the node has rejoined the cluster.

11. How do I disable usage of a shadow device?

To disable a shadow device, you would need to re-configure it (remove it from the configuration) and restart Aerospike service in a rolling manner across the cluster.

12. Is writing to EBS (shadow device) asynchronous or synchronous?

The writes are done asynchronously by default, but will be synchonous if commit-to-device is set to true. Note that this will increase latency as a trade-off for greater write reliability.

13. Do we have individual streaming write buffer for shadow device and instance store?

The streaming write buffers are individual for each local disks. But there will be two queues, one each for local device and shadow device. Say for example we have device sdb and its shadow sdf. When the stream write buffer gets filled for sdb, it is put into the queue and waits for a thread to write the buffer content to the disk. When writing to disk sdb is finished, the same write-buffer is put into another queue which is for shadow disk sdf. And meanwhile, a new write-buffer is flushing its content to sdb.

Thus, the same buffers are put into two queues (one for each device primary and shadow) one after the other.

By default we wait 1 seconds after a Streaming Write Buffer (SWB) has been allocated before flushing the write-buffer to disk. This can be tuned using the flush-max-ms configuration parameter. In addition to this, if the disk is not keeping up with the write load, there may be multiple SWBs pending to be written, The amount of data that can be pending is configurable through the [max-write-cache] ((Configuration Reference | Aerospike Documentation) configuration option, which by default is 64 MB.

14. How do I monitor the write-buffer and queues for the two devices?

The following log line will help confirm if the primary device (and the shadow device) is keeping up.

Nov 02 2017 07:39:08 GMT: INFO (drv_ssd): (drv_ssd.c:2095) {test} /dev/sdb4: used-bytes 382081920 free-wblocks 1514 write-q 0 write (0,0.0) defrag-q 0 defrag-read (1,0.0) defrag-write (0,0.0) shadow-write-q 0

For details on the individual fields, refer to the server log reference manual.

To troubleshoot device not keeping up issues, refer to the following knowledge base article.

15. Is there any change in RAM requirement for shadow device configuration?

There is no special RAM requirement as such for shadow device configuration.

16. Do I need to overprovision an SSD primary device when there’s an EBS shadow device?

A local attached SSD will be handling reads (both from application as well as from defragmentation) and writes whereas a shadow device only handle writes (mirrored from the primary device). It is also the primary device that would drive the perceived performance on the application side. Therefore, overprovisionning the primary device would still be beneficial, independently of the use of a shadow device.

Notes

Keywords

ephemeral aws shadow commit-to-device over-provision

Timestamp

February 2020