How does Aerospike reclaim space?


#1

I have used databases in the past that reclaim space using a process called “compaction.” This process takes tremendous resources and sometimes results in instability. How does Aerospike handle reclamation of space?


#2

Aerospike was designed from the beginning as an Enterprise-class database that would stay up 24 x 7. Rather than writing large files and compacting them at one time, a process reclaims space constantly throughout the day. This leads to much greater predictability and reliability. How this works on an SSD:

Storage is split into two areas: index (stored in RAM) and data (stored on SSD).

When data is written to a node, an index entry is made in RAM and the data is streamed to the SSD in blocks. These writes are intended to take full advantage of how SSDs write.

At some point, a delete or update may occur to a record. This means that the index will either get deleted or updated to a different location on the SSD. This may seem similar to a compaction, but it is not.

The major difference is that since Aerospike does not use a filesystem, we do not use “compaction.” Rather, there is a separate process that goes through the SSD and reclaims the space in what we refer to as “defragmentation.” This process runs constantly throughout the day (every 120 seconds by default). What this means is that you will not get a single big event to reclaim space, but many very small ones.

In addition to the above, there is another process that goes through the index and expires data that has aged beyond its time-to-live (configureable). You can alter the time-to-live value in the configuration file or override it in your client code. This process effectively deletes the index so that the space on SSD can be reclaimed by the defragmentation process.