Data cleanup: delete old non-accessed records


#1

Hi,

I am using community version 3.12.1.2. I have currently ~ 2B records stored. I am not sure how many of them are currently in used. Is there a way to check/get all records not accessed (read/write) from last X days/hour/seconds or last time each record has been accessed and then can be deleted? We had not set any TTL while writing records.


#2

There is no timestamp on Reads. But you do have a timestamp on last write in the record metadata - Last Update Time. You should be able to write a record UDF that deletes the record based on LUT metadata and invoke it on a namespace scan. That can delete records that have not been recently updated. But no way to skip those that have been read recently but updated a long time ago.

Having said that, beware of deleting records in CE or reducing their TTL. Deleted records or records manipulated to expire early can come back to life in certain scenarios of cluster change.