Remote Cluster in XDR Growing and Not Expiring As Expected...I Think

Hi!

We have a 3-cluster Aerospike setup: a 6-node local cluster, and 2 AWS remote clusters (the AWS clusters are sized differently than our local cluster). The XDR setup is active-passive from the local cluster to the remote clusters. We’ve enabled replication at the set level, so we’re not shipping the entire namespace.

All 3 clusters were running version 3.5 until recently, when I updated the AWS clusters to 3.10. Shortly thereafter it seemed that the remote clusters were quickly running out of disk space and constantly going over the high-water-disk-pct, which is set to 50%. The steps I took to remediate the problem were:

  • add additional nodes
  • change the default-ttl from 1095D to 15D (1095 was from the config on our local cluster and should have been edited for the remote cluster)
  • set replication factor to 1 (for our use case, this is fine)

I’m still seeing what to me is unusual growth on the remote clusters. The local cluster shows ~3 billion master objects. AWS1 has ~428 million, and AWS2 has ~213 million. With both remote clusters being shipped the same sets, and the configs on both AWS clusters being the same, why would AWS1 have double the number of objects as AWS2?

The second piece of confusion for me is understanding expiration on these clusters. For all 3 clusters the low-pct disk mark is the default (0) and the high-water-disk-pct is 50, therefore they should all 3 be expiring data. My local cluster is expiring data: 46949(2614628063) expired

But the AWS clusters seem to be expiring at a slower rate, even though their TTL is set to lower. AWS1: 0(559) expired

AWS2: 0(67977) expired

I ran the command asinfo -v ‘hist-dump:ns=prod_cp;hist=ttl’ to get some more insight. Local cluster: > ttl=100,946029,74713698,73000242,130840691,38823493,40849617,48754640,52353801,55451245,21571100,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,4138,0,0,952,0,637,1203,1375,2141,1665,842,2217,1322,2629,1767,2343,1798,1866,1902,1901,1862,2460,126052;

AWS1:

ttl=100,946026,101,126,42334303,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4826,0,0,1105,0,764,1323,1614,2464,1983,1017,2648,1541,2932,2213,2770,2073,2303,2258,2205,2269,2993,148475;

AWS2:

ttl=100,946026,1989,18242811,86821859,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,850,1368,1064,519,1456,913,1696,1221,1549,1127,1285,1279,1217,1241,2316,151507;

After reading a bunch of articles to figure out what the output meant, it seems that the buckets are divided into the same width on all 3 clusters, which happens to correspond to the original default-ttl of 1095D (100*946026)/3600 = 26278 hours (1094.9 days). For AWS2 for example, I have 18242811 records in bucket 1 that won’t expire out for 21 days (if I’m reading this correctly), and 151507 in bucket 99 that won’t expire for 1095 days.

Can I do something to change the widths of the buckets on the remote clusters to reflect the different ttl? Am I misunderstanding how this is supposed to work? Why am I growing data on the remote clusters so fast, and not evenly? Is it possible that the XDR writes are simply updating the generations on the records in a way that isn’t allowing them to expire?

I’m so confused.

If you are an XDR user then you must have an enterprise license, please open a ticket at https://support.aerospike.com.

There are some highlighted items from this post:

Why am I growing data on the remote clusters so fast, and not evenly?

From the Support email the inequality of records was determined to be AWS1 was added earlier and AWS2 added later.

For this point:

  • change the default-ttl from 1095D to 15D (1095 was from the config on our local cluster and should have been edited for the remote cluster)

The configuration parameter default-ttl only impacts new writes, and that is limited to new records not being written with explicit ttl values. Existing records will not be impacted through modification of this configuration parameter, thus your existing records would still be expiring at 1095D and not expiring as hoped at 15D.

The solution is to effect a Scan touch operation and update the TTL This can be accomplished using a User-Defined Function (UDF) with the Lua Programming Language or programmatically(using our supported Client Libraries).

To accomplish this using the Lua Programming Language see: Developing UDF Modules | Developer

The following article contains a verify basic example of how to update the ttl see: How to 'touch' record on UDF

NOTE: Make sure you do not enable xdr-ship-bins (http://www.aerospike.com/docs/reference/configuration#xdr-ship-bins) as this would then only ship the changed bins.

This document outlines the process to execute the UDF: http://www.aerospike.com/docs/tools/aql/udf_management.html

In the case of needing to kill_scan more details can be found in the AQL docs under Query and Scan Management see: http://www.aerospike.com/docs/tools/aql/query_scan_management.html

To deepen your understanding on UDF see: User-Defined Functions (UDF) Development Guide | Developer

Scan touch can also be performed programmatically, using our supported Client Libraries

For Example, the Java Client API allows you to perform operations on records. http://www.aerospike.com/docs/client/java/usage/kvs/multiops.html Here is an example of Scan Records code http://www.aerospike.com/docs/client/java/usage/scan/scan.html Here is the API Reference including information on the touch method. aerospike-client 7.2.0 javadoc (com.aerospike)

Here is another example using C Client API on Touching a Record http://www.aerospike.com/docs/client/c/usage/kvs/multiops.html#touching-a-record

Can I do something to change the widths of the buckets on the remote clusters to reflect the different ttl?

Histograms are configurable at the namespace level and not user configurable on a cluster to cluster basis.