Guidelines for Deleting Data


#1

Synopsis:

What are the guidelines for deleting data?

Solution:

Please note that the set-delete command has been deprecated. The truncate command should be used instead.

  • truncate introduced in Aerospike Server version 3.12.0, released in March 2017.
  • set-delete deprecated and works up to Aerospike 3.12.1, released in April 2017.

The version of the Aerospike Server being maintained will determine which of the following asinfo commands to execute and whether to follow the set-delete instructions listed below or use truncate.

Aerospike Server version 3.12.1 or above

Truncate

Test Truncations Gradually

It is recommended to test the truncations gradually, one set at a time and monitoring the potential impact on the overall system performance. Even though truncation is much more efficient than its predecessor set-delete, (for example it does not generate deletes that propagate to the replicas over fabric), the vacuum created by the sudden deletion of records could, for example, cause a surge in defragmentation activity, impacting the storage subsystem performance.

Usage

Truncate a set

asinfo -v "truncate:namespace=namespace_name;set=set_name"

The name of the set to be truncated would have to be specified. If not specified, the whole namespace will be truncated, including records not belonging to any sets.

Truncate a set up to a specific time

asinfo -v "truncate:namespace=namespace_name;set=set_name;lut="

The last updated time is expressed in nanoseconds (8 bytes) since the UNIX epoch (i.e., nanoseconds since 00:00:00 UTC on 1 Jan 1970). It can be given in hex (with a 0x prefix), decimal, or octal (with a 0 prefix). The lut time is not allowed to be older than the Citrusleaf epoch (00:00:00 UTC on 1 Jan 2010), and is not allowed to be beyond the current time (i.e. in the future). If not specified the current time is used.

Truncate a namespace

asinfo -v "truncate:namespace=namespace_name"

Log Analysis

Here are the log lines which will provide more information on the status of truncate.

The log line formats below are from version 3.12.1:

Apr 13 2017 00:41:24 GMT: INFO (truncate): (truncate.c:206) {test|testset} got command to truncate to now (229740084581)

Truncate command received.

NOTE: Will only appear on the node to which the info command was issued. The command is distributed to other nodes via system metadata (SMD), and only the truncating/starting/restarting/truncated/done log entries will appear on those nodes.

The timestamp printed in the logs is the truncation time but represented in milliseconds since the Citrusleaf epoch (00:00:00 UTC on 1 Jan 2010).

Apr 13 2017 00:41:24 GMT: INFO (truncate): (truncate.c:440) {test|testset} truncating to 229740084581

Truncate command received. Will appear on all the nodes after a truncate command is issued.

Apr 13 2017 00:41:24 GMT: INFO (truncate): (truncate.c:462) {test} starting truncate

Truncate command being processed for the namespace.

Apr 13 2017 00:41:27 GMT: INFO (truncate): (truncate.c:569) {test} truncated records (10,50)

Truncate command being processed for the namespace. The numbers in parenthesis represent (current,total).

current is the number of records that have been deleted by truncation since the command was issued (10 in this example).

total is the number of records that have been deleted by truncation since the server started (50 in this example).

Counts are only kept at the namespace level.

Apr 13 2017 00:41:27 GMT: INFO (truncate): (truncate.c:573) {test} done truncate

Truncate command completed.

Warning

  • If using the client APIs to perform the truncate command on a single-threaded application please add a millisecond(ms) sleep. The truncate operation has a 1 millisecond resolution & writes occurring within the same millisecond are not deleted.
  • If using the client APIs to perform the truncate command on a multi-threaded application please be aware that accuracy of the truncate is to a precision is a 1 millisecond(ms). Therefore, any writes occurring within that 1 millisecond(ms) need to be considered as those records will still persist.

Highlighted Guidelines

  • If security is enabled on the cluster ( enable-security true), the user executing the truncate command must have been granted the ‘sys-admin’ or data-admin role.
  • New writes can be performed into the sets being truncated, as the new writes will occur after the last update time (lut).
  • The truncate command rapidly removes entries from the in-DRAM index. There is no dependency on the namespace supervisor (nsup) cycle thread, so deletion begins immediately upon initiation of the truncate command.
  • Truncate can also be performed using the client APIs (example: Java Documentation).
  • Enterprise Edition only: In consideration of the case of a cold start, an entry is added in the SMD (System Meta Data) subsystem so that a full restart does not cause the data to return. This can be considered as the “tombstone” for the set.
  • Truncation essentially takes effect immediately across the cluster, and will apply to any migrations that might be in progress.
  • Truncation will apply to any new nodes joining or old nodes rejoining the cluster after the truncate was executed as the truncated time would be synced over the SMD protocol.
  • If the set metadata is required to be deleted as well, any potential roles with privileges on the set in that namespace would need to be dropped prior to cold starting each node in the cluster in a rolling fashion (Enterprise Edition only).

Aerospike Server version 3.12.0 or below

set-delete

Test Deletions Gradually

Meaning if you want to delete multiple sets of data, do not start out deleting all sets at once. Rather, delete 1 set initially and verify overall system impact. Then gradually increase the sets to be deleted monitoring the system during each iteration.

Monitor System Impact

There will be an overall system impact when deleting your data. Deletions will propagate over fabric and could impact network efficiency. Deletes can also impact the defragmentation rate, amplifying writes and potentially increasing transaction latencies.

When considering deletion you should consider tuning, as you will have to choose between:

A. Records being deleted at a quick pace, with a greater impact on latency

B. Records being deleted at a slower pace, with lesser impact on latency.

Tune Configuration Parameters

Some configuration parameters to consider to tune would be:

Log Analysis

Here are the two log lines which will provide more information on the status of set delete.

The log line formats below are from version 3.9.

{ns-name} Records: 37118670, 0 0-vt, 0(377102877) expired, 185677(145304222) evicted, 0(0) set deletes, 0(0) set evicted. Evict ttls: 34560,38880,0.118. Waits: 0,0,8743.

set deletes represents the number of records deleted by a set-delete command.

in-progress: tsvc-q 0 info-q 5 nsup-delete-q 10 rw-hash 0 proxy-hash 0 tree-gc-q 0

nsup-delete-q represents the number of records queued up for deletion by the nsup thread.

Highlighted Guidelines

Here are the most stressed highlighted guidelines:

  • If security is enabled (enable-security true), the user executing the set-delete command must have been granted the ‘sys-admin’ or ‘data-admin’ role.
  • Any roles with privileges on the set in that namespace would need to be dropped prior to truncating the set, as the metadata still exists and could return on a subsequent cold start.
  • In considering the case of a cold start, the index will be rebuilt from persistent storage and hence, deleted data for which the defragmentation has not processed or has processed but not yet overwritten will reappear.
  • In order to delete all the objects in the set in a cluster, set-delete needs to be dynamically set to true on all nodes.
    • For example:
      asadm -e “asinfo -v ‘set-config:context=namespace;id=test;set=testset;set-delete=true;’”
  • In order to delete the set data cleanly, ensure that set-delete shows as false, on ALL nodes, after executing the deletion command.
  • Do not perform writes into the sets being deleted.
  • Do not perform set-delete during migrations as this can result in records migrated into a partition after it has been processed for the set-delete, but before the whole set is deleted, resulting in the node being stuck in the ‘set-delete’ true state.

Reference

To read more about Truncating a Set in a Namespace, see Managing Sets in a Namespace

To read more about the truncate command, see Info Command Reference

To read more about Deleting a Set(Deprecated) in a Namespace, see Managing Sets in a Namespace

To read more about Set Delete Flow, see set-delete flow

To read more about Configuration Parameters, see Configuration Reference

To read more about details on each of the elements in the log lines, see Log Reference

To learn more about monitoring latencies, see Log Latency Tool (asloglatency)

To read more about What is the right way to delete sets completely, see What is the right way to delete sets completely?

To read more about Durable Deletes, see Durable Deletes

To read more about returning set statistics for all or a particular set, see Info Command Reference

To read more about administering user roles, see [Access Control] (https://www.aerospike.com/docs/guide/security/access-control.html)

Keywords

SET-DELETE DELETE DATA SET RECORDS LATENCY DEFRAGMENTATION DELETES TRUNCATE LUT NAMESPACE

Timestamp

06/12/2017


Expired records reappears in cold restart