More flexibility on eviction for LRU caches


#1

When using Aerospike as a simple LRU cache, as many do, I think the policy of Aerospike to “only removes data that has a TTL set and chooses the soonest to expire records to evict first” (as noted here: http://www.aerospike.com/docs/operations/configure/namespace/retention) is a sensible default given the current setup/capabilities. However it leaves something to be desired for people with LRU caching needs.

An ideal scenario would be to optionally enable read tracking within aerospike whereby aerospike would track times read and last read time. This data is critical for LRU (least recently used) since the data with the most cache hits is the most useful to remain cached - even if it has a shorter TTL - I would rather the longer TTL stuff gets evicted before something that gets 5000x more cache hits but has a shorter TTL to force a certain freshness of data. This is a common pattern in LRU caches and does not have a great solution in Aerospike.

But since the ideal scenario results in performance hits and complexity due to more writes to track read usage… another option might be to allow app developers to set an optional eviction priority. This way we can let aerospike know to kill this stuff off before others. For example, lets say priority was a 1 byte int - Aerospike could start the same eviction logic as far as the shortest TTL, but it would start first with the lowest priority and then work its way up to higher priority eviction tiers once it cleared others out.


#2

When you are talking about eviction, you are really talking about an undersized cluster. Why not just scale horizontally and add additional nodes and stay out of evictions? Operating in “eviction” mode is highly “not” recommended! You should never get into evictions if you have sized your cluster correctly for your use case. So this is very basic capacity planning and therefore I am also wondering if I am understanding your question correctly. (?)


#3

You can call it undersized if you want - but the reality of data caching is that not all data cached is worth the storage overhead in the first place. However, the application developer will often not know which data is worth more than others due to unpredictable usage patterns. Take a LRU cache with a billion cached rows - but only 25% of those get cache hits over the course of a month. 75% of that cached data is just wasting space. A properly implemented LRU system would simply trim the fat. You are proposing increasing storage capacity for data that should instead be deleted.


#4

@pgupta

I believe you are mistaken here. They are using Aerospike as a cache, by definition, it isn’t expected to hold everything. This is a valid eviction usecase IMO.


#5

As I suspected, I was not understanding their question correctly. So this is more along the lines of a feature request pertaining to the eviction algorithm in Aerospike. However, I am thinking aloud of a simple solution - do a touch() operation + get() and if that is done using operate() it can be done in single client to server call. @caffeinejolt - is your storage on SSD or pure memory? @kporter - do you think that will work? It will update the TTL on every get() thereby leaving untouched records as preferred candidates for evictions.


#6

This would essentially turn these reads into writes since they would need to sync this information to the storage layer. Could be fine for memory only use cases but would steampower anything else.

From the client, you could have each ‘cache read’ have a small probability of also ‘touching’ the record thereby refreshing the TTL.


#7

I am using SSDs. I tried to think of a way to handle this via application layer, but there really is no clean option - involves some form of writes on all reads. In the ideal scenario listed above, Aerospike would maintain read count (cache hits) in memory - last read time is probably not needed. Would be nice too is we could specify how much space was used to store the read count. For example, we might only care if a given record had more than 5 cache hits - no need to make that a 32bit int. Then eviction could start by wiping out records with the least cache hits that were older than X (since there needs to be some time allowed in order to get a cache hit).

If Aerospike had something like this - (tracking reads in memory) - its not a huge leap to offer other neat features - where heavily read records are kept in some memory namespace for faster access to the most sought after data.


#8

In the near term, you should explore operate() in your application layer - you can do both touch() and get() in the same record lock, one network hop. It will slow you down slightly because a touch() on SSD storage means the record has to be re-written to a new location with the updated TTL. Since the writes first go the write-block-buffer in RAM and eventually asynchronously to the SSD, you should not see that much of a performance hit due to touch().

What you are requesting is possible from my understanding of the architecture, but the challenge may be finding those 2 or 3 or 4 bits in the Primary Index to store the read hit count and then adding a new eviction algorithm option based on read hits instead of TTL - that means the nsup thread has to compute read hit histograms too - on and on.