How is data deleted?

deletion
durable-deletion

#1

How is data deleted? Aerospike separates the data into two parts: index and value. The index is always stored in DRAM, the value can be stored in either SSD or DRAM (with or without disk for persistence). When a record is deleted, the reference to it is removed from the index. The actual data is not removed from the disk. Another process will find that the data on disk is not being used and reclaim the space.

Note that it is possible for deleted object to reappear. For this to happen the following has to occur:

  • The node must be configured to load data from disk. This means that in the file “/etc/aerospike/aerospike.conf” that the variable “cold-start-empty” be set to false for the namespace.
  • That data has been deleted, but not yet removed from disk. (i.e. the index entry has been removed). The node has failed.The process “asd” has stopped either due to machine failure or the process was killed.

In this case when the node starts, it will read the data from disk and rebuild the index. Because the data has not been removed from disk, the node will think it is still active and build a new index entry for it. So the deleted object will return. If you know you will be taking down a node, you can prevent deleted data from returning by using the fast restart feature. This will hold the index in memory even when the database process has gone down.


Deleted records get resurrected after a restart
Delete record in aerospike
Complications during and after set deletion (AER-4890)
#2

I’m using Aerospike 3.3.8 In-Memory with disk Persistence. I met the issue of “deleted object to reappear again”:

  • Delete a record from a table
  • Wait for 10 seconds, and then restart the Aerospike server by issue “service aerospike restart”
  • After the Aerospike server start complete, query that table, and found that deleted record is reappeared!

Does it because Aerospike uses a large buffer to flush/snapshot the in-memory data into disk for Persistence, and the Flushing has not been occurred during that 10 seconds?

Seems Aerospike does not keep tracking to the data changes via data Log files — To let the data changes can be re-played when recover from a crash since last Snapshot/Checkpoint. Other DB systems use data Log files + Snapshot to secure data, like:

  • TimesTen: Todo Log + Checkpoint
  • Cassandra: Commit Log + SSTable

#3
When a record is deleted, the reference to it is removed from the index. The actual data is not removed from the disk. Another process will find that the data on disk is not being used and reclaim the space.

How long will the reclaim take place after a record deleted? I deleted a record, and waited for minutes to hours, but the timestamp of the persistence data file on disk is still not changed. Any configuration parameters can be tuned to let Aerospike to take care for small number of records been changed, and fast the reclaim to happen?


#4

Aerospike 3.10.0 introduces durable deletes for Aerospike Enterprise. Learn more about how they work here: www.aerospike.com/docs/guide/durable_deletes.html.


#5

Does Aerospike 4.1.0.1(Community Version), has ‘durable deletes’? Or is it still in the enterprise edition?


#6

It remains an Enterprise feature.


#7

So there can be scenarios(in the community version), where previously deleted data can come back?


#8

As before.