Truncating set in aerospike is realized by full scan set, Why?

cheney · November 23, 2017, 2:19am

In this link https://www.aerospike.com/launchpad/deleting_sets_and_data.html, aerospike introduces two methods to delete set. The first one is full scan set and record will be delete one by one. The second is asinfo command and this command has update to 'asinfo -v “truncate:namespace=namespace_name;set=set_name;lut=time” ', we can refer to https://www.aerospike.com/docs/operations/manage/sets#deleting-a-set-in-a-namespace. However, I has read the code of Aerospike in github, and I find that truncating set command is also realized by full scan. Why don’t truncate an entire set at a time? When indexs has built in set, the speed of delete set will be slow.

pgupta · November 23, 2017, 6:00am

Aerospike uses records as the basic unit of storage with namespace defining the storage medium. Set name is just metadata on the record, if you don’t specify it, null set is default. So each record has to be scanned to find out if it belongs to the set you are trying to truncate.

cheney · November 23, 2017, 6:38am

Thanks for your answer! If namespace is specified in truncate command and set is not specified, will all records in this namespace be scanned or not?

pgupta · November 23, 2017, 8:45am

I would think that since set is a metadata on a record and not a storage segregation parameter, whether you specify set or not, all records have to be scanned in a namespace. So, if set is specified, all records have to be scanned regardless to find out whether each record belongs to the set of interest or not.

cheney · November 23, 2017, 10:14am

Is namespace a storage segregation parameter?

rbotzer · November 23, 2017, 2:41pm

The namespace is the top level container, which determines the storage of all the records within it. For a specific namespace all the records will be stored on the same devices, or in memory, depending on your definition. See storage engine recipes for examples of this.

The github repo you’ve been reading is very old, and was a workaround from the time before truncate (release 3.12 in March 2017). I’m not sure how you determine truncate to be slow - it does not scan the data at all, it works on the primary index with multiple threads. Maybe you’re reading the wrong repo - it’s in aerospike/aerospike-server.

If you’re starting out with Aerospike in November 2017, use the latest release 3.15 - there’s no logical reason for you to use something that is over 8 months old.

cheney · November 24, 2017, 2:36am

I am sorry for my inaccurate description about truncate command. But I have read truncate’s implementation in v3.15, and I didn’t think that there are great changes to truncate’s implementation. I agree with your point that the truncate works on the primary index with multiple threads. But all record that can be traverse in primary index will be access and determined deletion or not in truncate’s implementation. Is that method equivalent to scan in set? Can aerospike clean up all record in namespace at once? I am surprised the speed of aerospike’s write and we have load data into aeropsike in 500k tps in real time. Aerospike is very excellent for it’s write and read in high concurrent. I am facing the problem that we want to delete some data selectively and we find the deletion’s speed of sets which has some secondary indexes is slower than aerospike’s concurrent write speed. This will lead to that the memory or the other resources will be occupied and new data can’t import. Are there any resolutions to cleaning data fast in aerospike?

rbotzer · November 24, 2017, 3:20am

From the application’s perspective the records are gone immediately once you truncate. The DRAM consumption will drop rapidly as the worker threads walk the primary index and pull out all the records in the namespace, or those of the specified set in the namespace.

Each namespace has 4096 partitions, each of which is expressed as a series of sprigs, each implemented as a red-black tree. As Piyush explained earlier, a set is a label for certain records in the namespace, so truncating a set still requires going over the entire set of sprigs for the partitions of the namespace. What did you set your partition-tree-sprigs and partition-tree-locks to? How many nodes in your cluster? Those are likely to affect your truncate speed.

cheney · November 24, 2017, 3:30am

Thanks for your help! I find one fast method to clean up an set. Firstly, drop all secondary indexes built in this set. Then, execute truncate command.

kporter · November 24, 2017, 3:53am

This is an excellent observation! If there are sindexs on the set then each delete will update the sindexs. If you are wiping the set entirely then it would be beneficial to delete the sindexs on that set beforehand.

Topic		Replies	Views
Remove sets in aerospike truncate	2	1062	April 6, 2022
Expiring an entire set using a lua script Deletion	4	308	September 14, 2023
Failing understanding Truncate-Namespace command	4	2576	November 15, 2019
Aerospike -community server 3.14.0 version	2	1089	June 14, 2017
Aerospike set delete lazily timeout deletion	2	1238	January 10, 2018

Truncating set in aerospike is realized by full scan set, Why?

Related topics