I am seeing that Aerospike is not removing expired recorsd from memory fast enough. This results in more writes tps vs. the rate that expired records are being removed from memory. Is there any setting other than nsup-period that we could use to speed up garbage collector?
Config:
service {
user root
group root
paxos-single-replica-limit 1
pidfile /var/run/aerospike/asd.pid
service-threads 8
transaction-queues 8
transaction-threads-per-queue 8
proto-fd-max 30000
proto-fd-idle-ms 60000
nsup-period 1
}
namespace Cache {
replication-factor 1 # Not even replicating
memory-size 59G
storage-engine memory
default-ttl 300
high-water-memory-pct 70
}
The nsup-delete-sleep configuration is used to limit the rate of NSUP deletes (Expiration/Eviction/Set-Delete). The default is 100 ÎĽs which allows a most 10,000 deletes per second.
While the memory-free-pct metric of the namespace drops below the expected percent, you can shorten the nsup-period from the 120 second default to 30 seconds. More frequent nsup thread runs will find the evicted records faster.
If the high watermark for memory is breached, evictions will start in order to free up space. You can have the nsup thread walk a bigger portion of the tree in each cycle by raising the evict-tenths-pct value from the default 5 (0.5%) to 20 (2%).
How do you determine it allows max of 10,000 deletes per second from the default 100 microseconds.
The definition for nsup-delete-sleep stated “Number of microseconds to sleep between generating delete transactions”. By transactions, does it mean single record in a set?
Example, if 100,000 is set to expired when nsup runs, does it remove 10,000 expired records, wait 100 microseconds and remove the next 10,000? Once all the 100,000 are removed, nsup sleep for the period specified in the nsup-period config, then delete starts again. Is this correct? I am unsure what is the relationship between nsup-delete-sleep and nsup-period.
By the way, your log indicates that the server hasn’t evicted or expired a single record: “expired, 0(0) evicted, 0(0)”. What TTL are you setting from the client?
Please provide the results of “asinfo -v 'hist-dump:ns=Cache;hist=ttl'”
waits: Accumulated waiting time for different stages of deletes to finish, in milliseconds. In order:
n_general_waits: the number of milliseconds nsup slept during general expiration and eviction while waiting for the nsup-delete-queue to drop to 10,000 elements or less (throttling).
n_clear_waits: the number of milliseconds until the nsup-delete-queue has cleared, at the end of the cycle for the current namespace.