How to Add, Replace, & Remove disks

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

How to add, replace and remove disks

Synopsys

Storage devices can be replaced and/or added with a fast-start (or cold-start for non hot-swappable devices).

Replacing a Storage Device

Hot-Swappable Devices

Inevitably storage devices fail. When this occurs, Aerospike is able to recover quickly. If nodes are equipped with hot-swappable drives, it is possible to simply replace the faulty device, zeroize the new device and update Aerospike’s configuration to use the new device in place of the old device. Finally, stop and fast-start the server (or cool-start if data in memory). If multiple nodes require device replacements, perform the same procedure on the other nodes, being sure to wait for migrations to complete before continuing to the next node (in order to fill back the new devices).

WARNING for older versions: For server versions prior to 4.2, it is critical the user maintains the order of the devices in the configuration file. If the second device failed then the replacement device must be the second device, preceded by the original first device and followed by the original devices that followed thereafter. For versions 4.2 and above, the above requirement does not apply. Device configuration order is independent across fast restarts.

Non-Hot-Swappable Devices

If the devices aren’t hot-swappable then it is necessary to stop the server to add the devices, in which case a cold-restart would be necessary, unless if the namespace had its index-type set to pmem or flash. The same procedure as for Hot-Swappable Devices applies just that it is now just a recommendation that the device order is maintained for versions prior to 4.2.

Expanding Storage Capacity

When expanding storage by replacing lesser capacity disks with larger capacity refer to the “Replacing a Storage Device” paragraph above.

Hot-Swappable Devices

To expand storage simply add the new devices to the node and zeroize them. Then, in the Aerospike’s configuration file append the devices to the end of the namespace’s storage devices. Finally, stop and start the Aerospike server. If expanding multiple nodes simply perform the same procedure on each node, wait for the service to become ready between each node.

Note: When adding a new device to a namespace, the data will not automatically rebalance across all the storage devices. As records are added / updated / their blocks defragmented, they will be re-distributed based on the new set of storage devices (per their digest hash). Depending on the workload, this could lead to low device_available_pct situations on the previously used storage devices. Over time, as all records get updated, the distribution should be very balanced. It may be necessary, though, to force such a rebalance by force updating all the records (for example using the touch command), or temporarily forcing more aggressive defragmentation (increasing defrag-lwm-pct). This would require close monitoring to avoid inadvertently impacting performance.

WARNING for older versions: For server versions prior to 4.2, it is critical that the user appends the new devices to the end of the namespace storage devices (ie. the new devices must be the last ones in the list). This is important in order to take advantage of fast restart (for configuration permitting it) as the index stores the device’s ids which are based on their order and would cause data misreads if a namespace is fast restarted with a different order in devices. Cold restarts are not impacted. For versions 4.2 and more recent, the above requirement does not apply. Device configuration order is independent across fast restarts.

Non-Hot-Swappable Devices

If the devices aren’t hot-swappable then you will have to stop the server to add the devices, in which case a cold-start will be performed unless the index is stored in flash or in pmem (index-type set to pmem or flash). The same procedure as for Hot-Swappable Devices applies just that it is no longer important that the device be appended to the end.

Removing Storage Capacity

Devices can be removed from the configuration file. The Aerospike server will fast restart if the configuration allows it. Migrations would of course have to repopulate the data that was removed as part of the removed device(s) and the capacity should be closely monitored across the remaining device(s).

WARNING for older versions: For server versions prior to 4.2, for removal of a device on a configuration supporting fast-restart, only removing of the last device(s) defined for the namespace is supported. For versions 4.2 and more recent, the above requirement does not apply. Device configuration order is independent across fast restarts.

Procedure

The steps listed below can be followed only when using either Hot-Swappable device or not. If restarting the host itself a cold-start would be necessary if the index is not configured to be stored in flash or pmem. If expanding multiple nodes simply perform the same procedure for each node, waiting for the service to become ready, and waiting for migrations to complete between each node if data was removed as part of the procedure (replacing or removing a device). The Upgrade Hardware documentation page has some general details as well.

Note: In Aerospike versions 4.3.1 and above, the quiesce command can be used to perform a smooth master handoff when doing rolling restarts so that operational changes are transparent to application users.

For each node in the cluster (one node at a time):

a. It is a good practice to take a backup of the node (optional if replication factor > 1).

b. quiesce the node and issue a recluster command.

c. Wait for incoming traffic to the node to stop as described on the ‘quiescing a node’ documentation page. The cluster-stable command can be used to check for migrations status. It may also be suggested to wait for migrations to complete (to have the expected number of copies for all data before taking the node out and removing or replacing one or more devices). Migrations can also be checked directly through asadm as follows:

Admin> info namespace

d. Stop Aerospike:

$sudo service aerospike stop

e. Add/Replace/Remove the storage device(s).

f. When the machine is ready, zeroize the new device(s):

$sudo dd if=/dev/zero of=/dev/DEVICE bs=1M

OR if blkdiscard is available:

$sudo blkdiscard /dev/DEVICE

g. Update the configuration file (e.g. aerospike.conf).

h. If replacing or removing a device and running against namespaces which are not strong-consistency enabled, set read-consistency-level-override by modifying the configuration file (aerospike.conf). This will prevent unnecessary not found errors during migrations when the data is repopulated.

  • Example for adding 2 new devices:

Before:

        namespace test {
         	  replication-factor 2
        	  memory-size 4G
        	  default-ttl 30d
            
        	  storage-engine device {
        		        device /dev/sdb1
        		        device /dev/sdb2
              	        device /dev/sdb3 # New device
              	        device /dev/sdb4 # New device
        		        write-block-size 128K
           	}
        }

After:

        namespace test {
         	  replication-factor 2
        	  memory-size 4G
        	  default-ttl 30d
        	  read-consistency-level-override all # To avoid client_delete_not_found & client_read_not_found errors

        	  storage-engine device {
        		        device /dev/sdb1
        		        device /dev/sdb2
              	        device /dev/sdb3 # New device
              	        device /dev/sdb4 # New device
        		        write-block-size 128K
        	  }
        }
        

i. Start Aerospike:

$sudo service aerospike start

j. Verify Aerospike service is ready and, if devices were removed or replaced, wait for migrations to complete prior to proceeding to the next node. The cluster-stable command can be used to make sure the cluster is back to the expected cluster size and migrations are done.

For Strong Consistency enabled Namespaces

The procedure remains the same to add new devices. Once the server the started after adding the new devices the namespace will come with an ‘e’ flag even when the namespaces is fast started. The ‘e’ flag is typically used to indicate a node is not to be trusted because its devices have been altered with or came down ungracefully (cold restart), which is of course not the case here, other than new devices being added.

Aug 29 2020 10:46:53 GMT: INFO (drv_ssd): (drv_ssd_ee.c:1399) {test} setting partition version 'e' flags 

In this case it is required to wait for migrations to complete before proceeding on to the next node. This is because the ‘e’ flag is only removed once the migrations complete. If another node is taken down before the ‘e’ flag is removed, since a node with the ‘e’ flag doesn’t count towards availability, some partitions would become unavailable. A future server release may address this specific behavior.

Note: For Aerospike versions 3.2.9 and below, it is necessary to zeroize all the devices when adding, removing or replacing even a single device.

Notes

  • Although it contains no records, be careful with the SMD directory when bringing the node back into the cluster, as the propagating SMD information could lead to unwanted truncates or other problems.
  • Migrations can be checked in asadm by viewing the info output for Namespace Information and waiting for Pending Migrates (tx%,rx%) to read (0,0).
  • The following article outlines method to execute a touch on all records to re-distribute data.
  • For Aerospike versions 3.10.1 to 3.13, many client_delete_not_found and client_read_not_found may occur when replacing disks. Setting the Namespace context read-consistency-level-override to all would address this.
    • This is due to the change introduced by: [AER-5273] - (KVS) Read duplicate resolution should only be done when repeatable read is turned on.
    • But as of version 3.13 (post new cluster protocol), a node wouldn’t claim master ownership of any partition immediately upon re-joining a cluster and would have to wait to have a full copy (through migrations), which would avoid the need to duplicate resolve.

References

Keywords

ADD REMOVE REPLACE STORAGE DEVICE UPGRADE HOT SWAPPABLE FLAG STRONG CONSISTENCY

Timestamp

July 2020