Bulk Durable Delete

We’re using the official Node.js Aerospike client and would like to make use of a bulk durable delete.

A non-persistent delete is doable via the following background UDF.

function delete(rec)
	aerospike:remove(rec)
end

A call to those UDFs is done via a background job like e.g.

const client = await Aerospike.connect()
const query = client.query(namespace, set, { nobins: true, filters: [filter] })
const job = await query.background(backgroundJob, 'delete')
await job.wait()

In case it’s done per-record and not via a stream, the functionality would look like the following one.

const client = await Aerospike.connect()
const query = client.query(namespace, set, { nobins: true, filters: [filter] })
const stream = query.foreach()

await new Promise((resolve, reject) => {
	stream.on('error', (error) => reject(error))
	stream.on('end', resolve)
	stream.on('data', ({ key }) => client.remove(key))
})

For a durable delete (Enterprise Feature), the only for us working implementation is the following one.

const client = await Aerospike.connect()
const query = client.query(namespace, set, { nobins: true, filters: [filter] })
const stream = query.foreach()

const policy = new policy.RemovePolicy({
	durableDelete: true
})

await new Promise((resolve, reject) => {
	stream.on('error', (error) => reject(error))
	stream.on('end', resolve)
	stream.on('data', ({ key }) => client.remove(key, policy))
})

As the provided solution for a durable delete is a per-record solution, the question is now how to implement a bulk durable delete with a background UDF or if not possible via a stream-based solution?

You should be able to perform durable deletes with UDFs by setting durable-delete to true in the write policy. Alternatively you can perform a background scan with the delete operation and a filter expression to target specific records - again setting the durable-delete in write policy to true. Please let us know if this doesn’t work for some reason.

Unfortunately, the documentation for the background job differs from the sourcecode.

The main difference is that the source code makes use of WritePolicy, while the documentation makes use of QueryPolicy. This is actually quite some serious difference by means of functionality.

Another issue seems to be the performance of a durable delete via background UDF vs record-stream-remove-query. For our test set (~ 1 million records) the background UDF is 48% slower than the record-stream-remove-query.

To my understanding, the background UDF runs fully on the server without any client interaction or whatsoever. While the record-stream-remove-query needs to retrieve at least the key or digest from the server and then execute the actual remove-query.

According to the server config documentation, background-scan-max-rps should be the right flag for performance tuning the background UDF, right?

EDIT: Setting the server config flag of background-scan-max-rps to its maximum of 1000000 leads to test results where the background UDF is 59% faster than the record-stream-remove-query. So, yes, it is that flag :slight_smile:

Great, you were able to get it working, and thanks for pointing out the doc inaccuracy (we will get it fixed). Indeed, UDFs are not usually to be used for highly performance sensitive tasks.

This topic was automatically closed 84 days after the last reply. New replies are no longer allowed.

© 2021 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.