Truncating set in aerospike is realized by full scan set, Why?

deletion
index

#1

In this link https://www.aerospike.com/launchpad/deleting_sets_and_data.html, aerospike introduces two methods to delete set. The first one is full scan set and record will be delete one by one. The second is asinfo command and this command has update to 'asinfo -v “truncate:namespace=namespace_name;set=set_name;lut=time” ', we can refer to https://www.aerospike.com/docs/operations/manage/sets#deleting-a-set-in-a-namespace. However, I has read the code of Aerospike in github, and I find that truncating set command is also realized by full scan. Why don’t truncate an entire set at a time? When indexs has built in set, the speed of delete set will be slow.


#2

Aerospike uses records as the basic unit of storage with namespace defining the storage medium. Set name is just metadata on the record, if you don’t specify it, null set is default. So each record has to be scanned to find out if it belongs to the set you are trying to truncate.


#3

Thanks for your answer!:grinning: If namespace is specified in truncate command and set is not specified, will all records in this namespace be scanned or not?


#4

I would think that since set is a metadata on a record and not a storage segregation parameter, whether you specify set or not, all records have to be scanned in a namespace. So, if set is specified, all records have to be scanned regardless to find out whether each record belongs to the set of interest or not.


#5

Is namespace a storage segregation parameter? :grinning:


#6

The namespace is the top level container, which determines the storage of all the records within it. For a specific namespace all the records will be stored on the same devices, or in memory, depending on your definition. See storage engine recipes for examples of this.

The github repo you’ve been reading is very old, and was a workaround from the time before truncate (release 3.12 in March 2017). I’m not sure how you determine truncate to be slow - it does not scan the data at all, it works on the primary index with multiple threads. Maybe you’re reading the wrong repo - it’s in aerospike/aerospike-server.

If you’re starting out with Aerospike in November 2017, use the latest release 3.15 - there’s no logical reason for you to use something that is over 8 months old.


#7

I am sorry for my inaccurate description about truncate command. But I have read truncate’s implementation in v3.15, and I didn’t think that there are great changes to truncate’s implementation. I agree with your point that the truncate works on the primary index with multiple threads. But all record that can be traverse in primary index will be access and determined deletion or not in truncate’s implementation. Is that method equivalent to scan in set? Can aerospike clean up all record in namespace at once? I am surprised the speed of aerospike’s write and we have load data into aeropsike in 500k tps in real time. Aerospike is very excellent for it’s write and read in high concurrent. I am facing the problem that we want to delete some data selectively and we find the deletion’s speed of sets which has some secondary indexes is slower than aerospike’s concurrent write speed. This will lead to that the memory or the other resources will be occupied and new data can’t import. Are there any resolutions to cleaning data fast in aerospike?


#8

From the application’s perspective the records are gone immediately once you truncate. The DRAM consumption will drop rapidly as the worker threads walk the primary index and pull out all the records in the namespace, or those of the specified set in the namespace.

Each namespace has 4096 partitions, each of which is expressed as a series of sprigs, each implemented as a red-black tree. As Piyush explained earlier, a set is a label for certain records in the namespace, so truncating a set still requires going over the entire set of sprigs for the partitions of the namespace. What did you set your partition-tree-sprigs and partition-tree-locks to? How many nodes in your cluster? Those are likely to affect your truncate speed.


#9

Thanks for your help! I find one fast method to clean up an set. Firstly, drop all secondary indexes built in this set. Then, execute truncate command.:grinning:


#10

This is an excellent observation! If there are sindexs on the set then each delete will update the sindexs. If you are wiping the set entirely then it would be beneficial to delete the sindexs on that set beforehand.