Aerospike provides durability using multiple techniques:
- Persistence by storing data in flash/SSD on every node and performing direct reads from flash
- Replication within a cluster by storing copies of the data on multiple nodes
- Replication across geographically separated clusters
Durability is achieved by SSD based storage and replication. The replication to replica (or prole) nodes is done synchronously. I.e., the client application will be informed only after the record is successfully written on the replica nodes. So, even if one node failed for any reason, we will have one more copy in another node thereby ensuring durability of the committed records. This is similar to the data + log file approach of traditional RDBMS where, If a data file is lost, the data can be recovered by replaying the log file. However, in case of Aerospike when one copy of the data is lost on a node, the data need not be recovered. Instead, the latest copy of the lost data is instantly available in one or more replica nodes in the same cluster as well as in nodes residing in remote clusters.
On top of this, Aerospike’s cross data center (XDR) support can be used to asynchronously replicate data to a geographically separated cluster providing an additional layer of durability. This will ensure that all of the data in the Aerospike database survives on a remote cluster even if an entire cluster fails and data is unrecoverable.
Resilience to simultaneous hardware failures
In the presence of node failures, applications written using Aerospike clients are able to seamlessly retrieve one of the copies of the data from the cluster with no special effort. This is because, in an Aerospike cluster, the virtual partitioning and distribution of data within the cluster is completely invisible to the application. Therefore, when application libraries make calls using the simple Client API to the Aerospike cluster, any node can take requests for any piece of data.
If a cluster node receives a request for a piece of data it does not have locally, it satisfies the request by generating a proxy request fetch the data from the real owner using the internal cluster interconnect and subsequently replying to the client directly. The Aerospike client-server protocol also implements caching of latest known locations of client requested data in the client library itself thus minimizing the number of network hops required to respond to a client request.
During the period immediately after a cluster node has been added or removed, the Aerospike cluster automatically transfers data between the nodes to rebalance and achieve data availability. During this time, Aerospike’s internal "proxy" transaction tracking allows high-consistency to be achieved by applying reads and writes to the cluster nodes which have the data, even if the data is in motion.
Offsite data storage and cross data center portability
Aerospike provides online backup and restore, which, as the name indicates, can be applied while the cluster is in operation. Even though data replication will solve most real-world data center availability issues, an essential tool of any database administrator is the ability to run backup and restore. An Aerospike cluster has the ability to iterate all data within a namespace (similar to a map/reduce). The backup and restore tools are typically run on a maintenance machine with a large amount of inexpensive, standard rotational disk.
Aerospike backup and restore tools are made available with full source. The file format in use is optimized for high speed but uses an ASCII format, allowing an operator to validate the data inserted into the cluster, and use standard scripts to move data from one data store to another. The backup tool splits the backup into multiple files. This allows restores to occur in parallel from multiple machines in the case of needing a very rapid response to a catastrophic failure event.