Eviction mechanisms in Aerospike

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

Eviction mechanisms in Aerospike

Abstract

The mechanism by which records are selected for eviction has changed significantly from the Aerospike 3.8 release onwards. The purpose of this article is to discuss the new mechanism and mention the shortcomings of the pre-3.8 eviction mechanism with focus on how previous drawbacks have been addressed.

Eviction mechanism Aerospike 3.8 and onwards

In Aerospike 3.8 and onwards there are two signficant changes to the eviction mechanism. These are as follows:

  • The number of eviction histogram buckets is much greater and can be configured by the user. This is possible by seperating the granularity of expiration TTL histogram and the eviction algorithm.
  • Buckets will not be partially evicted during eviction

The default number of buckets for the eviction algorithm is 10,000 although this can be configured between 100 and 10,000,000. The parameter controlling the number of eviction histogram buckets is evict-hist-buckets and it can be set as follows:

asinfo -v 'set-config:context=namespace;id=test;evict-hist-buckets=20000'

As implied by the use of asinfo, evict-hist-buckets is a dynamic parameter (can be modified without a cluster or node outage). If this value is modified it will come into effect at the next round of evictions. As shown in the asinfo command the context for the evict-hist-buckets parameter is the namespace, so it can be set on a per-namespace basis. The command can also be included as a permanent setting within the aerospike.conf file.

The ability to configure the amount of buckets used by the eviction algorithm addresses the issue of poor granularity within the TTL distribution as granularity can now be controlled. Thus, The width of each bucket in the expiration TTL histogram is still _maximum-record-ttl/100 while the width of each eviction histogram bucket is _maximum-record-ttl/evict-hist-buckets. This resolves issues of apparent uneven distribution and poor granularity within the TTL distribution when there is a large TTL variance within a single namespace. It follows then, that with the ability to manage the width of the eviction histogram buckets, it is easier to manage sizing, predict behaviour during evictions and fit TTL distrubution more closely to the desired use case.

It should be noted that each eviction histogram bucket requires 4 bytes of memory and therefore if the maximum 10,000,000 buckets were set, the resultant eviction histogram would consume 40 MB of memory.

When evicting during a cold start process, the cluster will configure either 100,000 buckets or use the evict-hist-buckets , the highest value of the two will be chosen. This will only be valid for cold start eviction, when the cluster is running, evict-hist-buckets will be the dominant parameter.

The second significant change to the eviction mechanism is that in release 3.8 and higher only entire buckets will be evicted. There will be no more eviction from within buckets. When a round of eviction occurs evict-tenths-pct is used to determine the amount of records that must be evicted per NSUP cycle. This figure is used to define a threshold bucket. All buckets below (but not including) the threshold bucket will be evicted. This negates the confusion caused by random and partial eviction from buckets. A bucket is evicted completely or not at all meaning the process is far more predictable and manageable.

Log Messages

When an eviction round is in progress the amount of records to be evicted is shown clearly in an INFO level log message as follows:

Apr 07 2016 13:42:17 GMT: INFO (nsup): (thr_nsup.c:1079)
{test} found 12345 records eligible for eviction

The message above indicates that 12345 records are scheduled for eviction in the current round.

In some circumstances, no records will be evicted. This could be for a number of reasons and these will be shown in the logs.

Only records with a non-zero TTL will be eligable for eviction (meaning that records without a TTL or with a TTL of 0 will not be considered for eviction). If the namespace contains only records that cannot be evicted the following message will indicate this.

Apr 07 2016 13:42:17 GMT: WARNING (nsup): (thr_nsup.c:1065) {test}
no records eligible for eviction

As discussed earlier, only buckets below the threshold bucket will be considered for eviction. If the threshold bucket is the first bucket, this means that no buckets can be evicted. The following log message will indicate this (here 200346037 is the void time of the threshold bucket)

Apr 07 2016 13:42:17 GMT: WARNING (nsup): (thr_nsup.c:1068)
{test} no records below eviction void-time 200346037 - insufficient histogram resolution?

The message suggests that the eviction process could continue if the resolution of the histogram was increased (by setting a higher value for evict-hist-buckets) this will only work if the lower bucket contains a range of TTL. If the records in that bucket all have the same TTL they will remain in the same bucket regardless of resolution. Then increasing the evict-tenths-pct in intervals of 5 (as in 5, 10, 15, 20) will increase the total number of records to be evicted in that NSUP cycle and this increased number of to be evicted records may eventually become equal to the total number of records in the bucket and thus get evicted. Increasing the evict-hist-buckets may have lesser impact than increasing evict_tenths_pct.

Jan 30 2017 02:36:21 GMT: WARNING (nsup): (thr_nsup.c:1043) {test} no records below eviction void-time 222541923 - threshold bucket 361, width 259 sec, count 686375 > target 530312 (0.5 pct)

The above log message suggests that no records below the indicated void time are eligible for eviction and thus we address the situtaion by either increasing the eviction histogram resolution(increasing evict-hist-buckets) and/ or by increasing the evict-tenths-pct periodically. Further, the log line indicates that the threshold bucket is 361 out of the number of evict-hist-buckets and 259 sec is the evition histogram bucket width calculated as maximum-record-ttl/evict-hist-buckets. “count 686375 > target 530312” in the log line indicates that the eviction process found the first populated bucket to begin at that void time, and wanted to evict 530,312 records but found 686.375 records in that first bucket, thus no records are evicted, as partial buckets will not be evicted from Aerospike 3.8 onwards.

IMPORTANT: If NSUP is configured to evict 100% of the records or more in a single cycle (i.e.evict-tenths-pct set to 1000 or nore), no records will be evicted and a message like the following will be logged:

Oct 24 2020 09:26:42 GMT: WARNING (nsup): (nsup.c:753) {ns_cache} would evict all 146897768 records eligible - not evicting!

Eviction of all evictable records can only be done manually, by setting the eviction depth far enough into the future that all records with TTLS expire. As the maximum TTL is ten years, setting the eviction depth ten years in the future should always suffice.

Conclusion

By making the number of buckets in the eviction algorithm user configurable, shortcomings in TTL distribution granularity, record distribution within buckets and therefore sizing management and conformity to use case have been addressed. Having switched to evicting entire buckets as opposed to random evictions within buckets, predictability is further increased in comparison to pre-3.8 versions even when records return after cold start. There are situations where evictions will not be possible, these situations will be logged in a clear and understandable manner. Key parameters currently used to control eviction nsup-period and evict-tenths-pct are still valid, evict-hist-buckets has been added for greater control over the eviction process.

Notes

  • Void time is the absolute time when a record should be removed from the database (expired). Void time is measured from the “aerospike epoch”, which is the number of seconds since 1 January 2010 (midnight GMT) or 40 years after the UNIX epoch (1 January 1970 at midnight GMT).
  • TTL (time to live) is the difference between the current time and the void time.
  • evict-tenths-pct Maximum 1/10th percentage of objects to be deleted during each cycle of NSUP.
  • nsup-period defines the amount of time in seconds for NSUP to sleep between cycles (default 120s)
  • evict-hist-buckets defines the amount of buckets used in the eviction algorithm
  • Expiration TTL histogram bucket width will be maximum-record-ttl/100
  • Eviction histogram bucked width will be maximum-record-ttl/evict-hist-buckets
  • Cold start will use a minimum of 100,000 buckets for the eviction histogram

Eviction mechanism prior to Aerospike 3.8

Eviction, the process of removing records from the database prior to their void time, happens when a high water mark (either disk or memory) is breached. The records selected for eviction are based on the TTL histogram which consists of 100 buckets. The width of each bucket is fixed, as is the amount of buckets. The bucket width is defined as maximum-record-ttl/100. So if the longest TTL in the namespace is 100 years, the width of each bucket would be 1 year. Any record with a TTL between 1 second and 1 year would be contained in the first bucket of the histogram. An example TTL histogram is shown below. Here the first integer indicates 100 buckets and the second indicates a bucket width of 31536000 seconds or 1 year, the following integers indicate the number of records in each bucket.

asinfo -v "hist-dump:ns=example_ns;set=Example_set;hist=ttl"
example_ns:ttl=100,31536000,452881499,2235,4721,5434,26579,9757,742,9695,11753,9160,907,1796,1985,418,998,2865,1338,740,1964,3628,2730,4833,2419,142,5264,2952,61780,30629,32715,37573,38965,31505,260477,225917,212545,226774,239324,312687,483451,599760,495514,453962,493150,509650,491761,542584,606675,599817,583461,362805,728058,520691,547161,525743,541940,528916,573520,565452,562626,524747,538039,574458,496418,553521,545130,587124,620716,645112,684276,663857,633111,554691,540100,519253,575330,625288,682586,733206,631298,580352,572965,573191,605929,627315,711731,757881,560359,503827,480136,573680,655015,666074,643144,672304,637874,611225,564829,559463,546093,2858814;

When the namespace supervisor sub-system (NSUP) starts evicting records, any record in the first bucket may be evicted. It is entirely possible that records with TTL of 1 second, 1 day, 30 days and 11 months could all be evicted at the same time if the bucket width is 1 year. The eviction process happens in cycles, NSUP wakes up, evicts records and then sleeps for a given period (configured with nsup-period). In each NSUP cycle the amount of records deleted is controlled by the parameter evict-tenths-pct . Rounds of evictions will continue until the disk or memory usage is below the relevant high water mark. On each NSUP cycle the TTL histogram will be recomputed. The following shortcomings are present in this design:

  • The amount of buckets for the TTL histogram is fixed and if there is a wide variance in the TTLs of records within the namespace, the distribution of records within buckets can be skewed.
  • Eviction within buckets is random and so if this distribution is skewed, eviction can appear unpredictable
  • With a 100 bucket limit on the TTL histogram, granularity is limited and therefore managing database size using eviction can be difficult.
  • Histograms are not user configurable and therefore cannot be modified to fit different use cases.
  • If the system cold starts records that have previously been evicted may be resurrected (Expired/Deleted data reappears after server is restarted) this may mean that other records from the same bucket are then evicted in their place. There is no predictability to the evictions within a bucket.

More details on Eviction Algorithm applicable for versions before 3.8:

Keywords

EVICTION 3.8 EVICT-HIST-BUCKETS AER-4655 GRANULARITY TTL HISTOGRAM EVICT-TENTHS-PCT

Timestamp

November 2020

Hello,

On a vanilla install of 3.12.1, evict-hist-buckets seems to be set to 100, on a in memory namespace. This is not what the document said. Can it be specific to a in memory namespace ?

I tried to change it in config or by command line to 1000. The value changed in config (asinfo -v "namespace/test" -l | grep evict gives the good result), but asinfo -v 'hist-dump:ns=test;hist=ttl'still dumps an histogram of 100 values.

Am I missing something ?

Thx

When running the hist-dump command, you will always see 100 buckets as it wouldn’t be practical to show more, but the code will go by the evict-hist-buckets configured value.