Aerospike have produced a tool, ASMT (Aerospike Shared Memory Tool), that allows the Aerospike primary index to be backed up on disk. This document describes its functionality and usage.
Aerospike developed ASMT to speed up graceful cold restarts when a restart of the underlying Linux instance is unavoidable. Each Aerospike namespace has a number of shared memory segments. When the Aerospike node shuts down, these segments remain in shared memory meaning that if the instance itself remains running and the
asd process shutdown is clean, the Aerospike node fast restarts (also referred to as warm restarts). The advantage to a warm restart is primarily one of speed. The alternative is a cold start whereby the index is rebuilt by reading data back from disk. The key disadvantage to cold start is the time it takes. There are other issues that can manifest such as cold start refragmentation or resurrection of deleted records in a non-durable delete environment or, in a worst case scenario, cold start eviction. For these reasons it is always better to warm start when possible. There are circumstances where a warm start is not possible, for example, when the server itself is restarted and shared memory is lost, such as during OS/kernel patching.
ASMT can work in either backup or restore mode. In backup mode, once Aerospike has been shutdown completely, ASMT writes a copy of the shared memory segments to disk. In a clean shutdown Aerospike tags the shared memory to let subsequent start ups know that the memory is safe to use. Without that tag ASMT will not backup shared memory segments as they may still be in use. Likewise, in restore mode, the datafiles previously written by ASMT are read and written back into shared memory which allows the Aerospike node to warm start.
No, ASMT is extremely conservative. Before copying any files to the filesystem it calculates the space required by the shared memory (base, tree and arenas) and it pre-allocates that space. If there is a failure mid way through the copying process, ASMT will remove all of the in progress files so that it is not possible to do an inadvertent partial restore. ASMT will also fail if it finds pre-existing files with the same name as the name it plans to use. ASMT does not remove any directories it creates if stopped during operation.
Yes, ASMT can do a CRC check as a final step when writing files if the option -c is used. This would cause some overhead in the overall process unless the compression option is used, in which case there wouldn’t be any overhead.
This is configurable, the default is that all namespaces are backed up. The -n option is used to select namespaces.
Yes, ASMT’s -i option will backup to or restore from shared memory of a specific instance number. This flag takes an integer for the instance number.
Yes, the -z option selects compression. Compression level is not configurable. We have observed up to 30% compression in some of our tests. When using compression, adding the -c option for CRC check would not add any over head as it would be included as part of the compression.
Yes, the -a flag switches on analyse mode which runs like a backup or restore but does not read or write any files.
Yes, ASMT will restore an image of the primary index prior to the server restart, if records were deleted in the interim these will return.
ASMT returns 0 for success and 1 for error.
ASMT operates in a very conservative fashion. The -v verbose option provides more information about execution and even the shared memory segments it finds, including:
- Segment type – Tree, Arena
The startup will fail. Likewise, ASMT will not start if the Aerospike node is running.
Yes, but this can be via sudo. Aerospike running as non-root can be done but ownership of shared memory segments must be changed manually after the restore has completed.
Yes but checks must be implemented to ensure that ASMT and the Aerospike node do not try and start while one another are running. As described above, one or other will fail to start in this scenario. Potentially more important is to have controls in place to ensure that on start up the Aerospike node only restores a current valid ASMT backup. If the backup of the previous index failed post shutdown, it is possible that a scripted ASMT assisted Aerospike warm start may restore a stale backup file leading to inconsistent data.
Not really, the configuration and devices would have to be an exact match for the system to work correctly. The purpose of ASMT is not to clone or move servers it is to speed up cold starts when an instance or host restart is unavoidable.
Yes, as long as the storage attached to the cloud instance persists a restart. With AWS, for example, if EBS is the primary storage then ASMT can be used as normal. If ephemeral drives are used for primary storage then ASMT will work in the event of an instance restart (where the same ephemeral drives would be present when the server comes back up) but not if the instance has been stopped and restarted (where the instance and ephemeral drives may be different).
- Documentation on when an Aerospike node will warm restart.
- Knowledge base article on speeding up cold start.
AMST COLD START WARM START SPEED UP SHARED MEMORY