FAQ - What options are available to speed up cold start eviction

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

FAQ - What options are available to speed up cold start eviction

Detail

When a node cold starts, it scans the disk and rebuilds the primary index from scratch. It is possible, under certain conditions, that eviction will have to happen before the node can cold start. A common reason for this might be that the loading of records into the index has caused the namespace to exceed the high-water-memory-pct configured (especially if previously non durable deleted records are on the disk). When cold start eviction is taking place the following messages will be present in the aerospike.log.

Sep 29 2016 08:27:50 GMT: INFO (nsup): (thr_nsup.c:324) {test} cold-start building eviction histogram ...
Sep 29 2016 08:27:50 GMT: INFO (drv_ssd): (drv_ssd.c:3974) {test} loaded 699392 records, 0 subrecords, /opt/aerospike/data/test.dat 28%

Evictions during cold start can literally increase start up time exponentially. Indeed, every time one of the high water configuration threshold is breached (high-water-memory-pct, high-water-disk-pct or mounts-high-water-pct, eviction histograms have to be recalculated for the entirety of the data already loaded to potentially only evict a few records (if any at all), load a few more from disk and breach the threshold again and iterate over the same process.

What are the options to speed up the eviction process during a cold start?

Answer

Change the eviction threshold

There are two things that can be done to reduce the time spent evicting records during a cold start:

  • Increase the relevant high water configuration (high-water-memory-pct, high-water-disk-pct or mounts-high-water-pct) in the aerospike.conf file.

This would have to be done prior to the cold start as it can only be done by making an entry in the namespace stanza of aerospike.conf file when the server is not up. This increases the eviction threshold for memory or disk as necessary. This should only be used as a temporary measure to get the node running, settings should be reverted once the node has re-joined the cluster. The entry should then be removed from the aerospike.conf file so that it is not re-applied on next startup. An example entry in aerospike.conf would look as follows:

high-water-memory-pct 80

The info command to revert this after start up would be:

$asinfo -v 'set-config:context=namespace;id=test;high-water-memory-pct=60'

Do not forget to also revert in the configuration file.

  • Modify cold-start-evict-ttl

The parameter cold-start-evict-ttl sets the TTL below which records will be evicted during a cold start. This is another static parameter meaning that it must be included in aerospike.conf before start up. This parameter is measured in seconds. The value to choose for this parameter should be based on the records in the database at the time of startup.

Disable cold start eviction

The configuration disable-cold-start-eviction may be used as a way to disable eviction during cold-start for the concerned namespace and thus speed up cold-start process.

Note: this would mean that the node would load as many records it can, and if the usage reaches the capacity of the node (configured stop-writes-pct, it would prevent the Aerospike node from starting up unless more capacity is added to the node.

Example: For a node breaching high-water-mark on cold-start and unable to evict:

Aug 07 2020 20:59:21 GMT: WARNING (nsup): (nsup.c:1205) {bar} cold start found no records below eviction void-time 334615283 - threshold bucket 85322, width 1 sec, count 435 > target 402 (0.5 pct)
Aug 07 2020 20:59:21 GMT: WARNING (nsup): (nsup.c:1134) {bar} hwm breached but nothing to evict

But, Aerospike would print the following once stop-writes is reached:

Aug 07 2020 20:59:21 GMT: WARNING (nsup): (nsup.c:875) {bar} breached stop-writes limit (memory), memory sz:6351696 (5211648 + 0 + 1140048) limit:6291456, disk avail-pct:100
Aug 07 2020 20:59:21 GMT: WARNING (nsup): (nsup.c:1075) {bar} hit stop-writes limit
Aug 07 2020 20:59:21 GMT: CRITICAL (drv_ssd): (drv_ssd.c:2368) hit stop-writes limit before drive scan completed

Empty the node and let migrations repopulate it

Emptying the node prior to starting it up would of course prevent any potential evictions. Bringing up the node empty using the cold-start-empty configuration would also achieve this, but it is important to understand its impact on potential data loss in case another node is impacted on a default replication-factor 2 setup.

Notes

Keywords

COLD START EVICT EVICTIONS SLOW

Timestamp

July 2020