Aerospike is not removing expired records from memory fast enough


#1

Hello,

I am seeing that Aerospike is not removing expired recorsd from memory fast enough. This results in more writes tps vs. the rate that expired records are being removed from memory. Is there any setting other than nsup-period that we could use to speed up garbage collector?

Config:

service {
  user root
  group root
  paxos-single-replica-limit 1         
  pidfile /var/run/aerospike/asd.pid
  service-threads 8
  transaction-queues 8
  transaction-threads-per-queue 8
  proto-fd-max 30000                   
  proto-fd-idle-ms 60000               
  nsup-period 1                        
}

namespace Cache {
  replication-factor 1       # Not even replicating
  memory-size 59G            
  storage-engine memory      
  default-ttl 300            
  high-water-memory-pct 70
}

Cluster Size: 4 nodes (8 cpu, 61 gig of ram, aws r4.2xl) Avg read tps: ~256k Avg write tps: ~53k

Log:

INFO (nsup): (thr_nsup.c:1097) {Cache} Records: 11312811, 0 0-vt, 2471323(24311376) expired, 0(0) evicted, 0(0) set deletes. Evict ttl: 0. Waits: 0,0,489936. Total time: 550646 ms

Thanks, Alex


#2

The nsup-delete-sleep configuration is used to limit the rate of NSUP deletes (Expiration/Eviction/Set-Delete). The default is 100 μs which allows a most 10,000 deletes per second.


#3

While the memory-free-pct metric of the namespace drops below the expected percent, you can shorten the nsup-period from the 120 second default to 30 seconds. More frequent nsup thread runs will find the evicted records faster.

If the high watermark for memory is breached, evictions will start in order to free up space. You can have the nsup thread walk a bigger portion of the tree in each cycle by raising the evict-tenths-pct value from the default 5 (0.5%) to 20 (2%).

See the knowledge base FAQ What are Expiration, Eviction and Stop-writes.


#4

Thanks. Question regarding nsup-delete-sleep configuration:

  1. How do you determine it allows max of 10,000 deletes per second from the default 100 microseconds.

  2. The definition for nsup-delete-sleep stated “Number of microseconds to sleep between generating delete transactions”. By transactions, does it mean single record in a set?

Example, if 100,000 is set to expired when nsup runs, does it remove 10,000 expired records, wait 100 microseconds and remove the next 10,000? Once all the 100,000 are removed, nsup sleep for the period specified in the nsup-period config, then delete starts again. Is this correct? I am unsure what is the relationship between nsup-delete-sleep and nsup-period.

Thanks


#5

Between each delete transaction.

By the way, your log indicates that the server hasn’t evicted or expired a single record: “expired, 0(0) evicted, 0(0)”. What TTL are you setting from the client?

Please provide the results of “asinfo -v 'hist-dump:ns=Cache;hist=ttl'