Getting created time stamp of a record to measure record age (not ttl)

query
udf

#1

Hi, As per the thread: Get bin insert time/modification time, we don’t have any inbuilt support for created timestamp or modification timestamp.

For modification timestamp, we can write (update policy) unix epoch in a bin, with each write invocations.

But for created timestamp we can do either of following things:

  1. Create a UDF, which when invoked checks if a record exists or not, if record doesn’t exists then add a column created_timestamp and put unix epoch in it.

  2. Make two operations to Aerospike, one will check existence of record and another operation to put columns with created_timestamp (if record doesn’t exists). (Created timestamp is generated on the application)

Note that while doing write (via any of the above two method) it is made sure that new/updated values are also getting written in their respective column(s). created_timestamp is a non-aerospike provided metadata for use-case of my application.

Now, the problem is that UDF is not much optimized as compared to Native Operations, I was earlier using UDF to achieve the goal, but when I switched to combination of Native Operations (2nd point), the CPU utilization decreased by approx 15%, so it is a big no for me to use UDF for such a small task.

If you notice that the task or goal which I am trying to achieve here is being done by two aerospike operations (batching won’t help as it creates one connection to aerospike node, but does the passed number of operations individually on a record).

I am looking forward to optimize the two operations into one and at the same time achieving created_timestamp (rather than modified_timestamp) goal.

Please suggest something better which I may be missing here.


#2

2- may not work 100%. checking exists is a separate lock than create the record. another client could have possibly created the record after your exists check.

Explore WritePolicy UPDATE_ONLY (Java enum) - fails to update the record if the record does not exist - and in that case, on failure, use CREATE_ONLY (fail if it exists) and create it. So still two transactions but the probability of doing two operations every time greatly reduced. If CREATE_ONLY also fails (rare possibility) try UPDATE_ONLY again and it should succeed … unless you are also DELETING records!


#3

@pgupta -

Sorry, I forgot to mention that I am writing from a one particular client, and other clients are just reading the columns. Since no other clients except the one who is checking for existence is writing (or updating the bin), the 2 won’t fail.

Every time I invoke a write operation I need to update the value of other columns, but not created_timestamp (except that it is write for the first time) value. So using CREATE_ONLY and then UPDATE_ONLY, essentially increases my number of operations in all scenario.


#4

Use case for my requirement of created_timestamp:

  1. To determine the age of record (age: in any time unit)
  2. & also if age of record is greater than some value then update the ttl (which can be handled by other clients)

#5

Is your true use case - update the record without changing the ttl but if the ttl is below a certain value, update the ttl also? (ie are you adding these timestamps etc to achieve this functionality?)


#6

No, I understand that I can get TTL from record metadata. But the use case is totally based on when record first came into the system.

Let’s take an example:

  1. Record gets inserted with a TTL of 1 Day (first insert)
  2. Record appeared after 4th hour of its creation, don’t update the TTL. (Remaining TTL: 20hours)
  3. Record appeared after 8th hour of its creation (Remaining TTL: 16hours), update the TTL. (TTL now changed to a new value, let’s say 24 hours)
  4. Record appeared after 16th hour (measured by when it was created in 1st step) of its creation, then don’t update the TTL. (Remaining TTL: 16 hours)

If I check on TTL (let’s say TTL>16 then update the TTL) then at 4th condition I will need to update the TTL also, when it was actually not required.

For the example sake, pls assume that record is coming at different time/hours via some API.


Quick question: Why cannot be record creation/insertion time be a record metadata (user driven model, if user marks set/record for the miscellaneous metadata tracking) ? Is there any roadmap for it?


#7

So what you are saying is that any updates within 16 hours of creation should add another 24 hours to the TTL, but after 16 hours of creation, you don’t want to add any TTL on updates. So can you start with initial TTL of 16+24 hours and do ALL updates with TTL=-2…? Then its really just a sizing issue - records will hang around for a little bit longer than you wanted but will get cleared out.


#8

@pgupta - for example sake, it was small TTL value. But if record really hangs around a little more because of initial TTL set the way you suggested it will increase resource utilisation to great extent, as count of records are in billion.

To me it looks like not possible directly. Will re-think the situation and approach.


#9

By the way, why created timestamp cannot be in metadata as suggested in previous reply of mine…


#10

The only timestamp type metadata you have is the LastUpdateTime (LUT) which is currently only accessible via UDF - cannot be read back into the client application. LUT is updated every time you update the record though. So no way to get original creation time unless you store it explicitly in a record bin. You can make a feature request on this forum - in the feature requests section.


#11

That is not much of a count for Aerospike. Depends on size of each record. Using the data size / number of bins approx, go through the capacity planning page - https://www.aerospike.com/docs/operations/plan/capacity/index.html - do a basic sizing of your cluster.


#12

Agree but keeping records hanging around for no more use will still Primary Key’s 64byte (per record) in RAM. So if I have 100M+ records which I pre-deterministically know that is not going to be used then I am using extra 6GB+ RAM.