Query records by using record's metadata (lut) and filter expressions

Hello, I’ve been looking for a way to query efficiently records and I came up with idea to build an expression that will rely on its metadata since its in the index metadata and hopefully some storage access can be saved however, I am lost in trying to build the proper filter and I need some guidance on what I am doing wrong.

I am using .NET client but I believe client doesn’t matter at this point.

Here is essential part of the code:

(My range builder)

private static readonly DateTime _unixEpoch = new(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);

private static (double start, double end) CreateRange(ReportInterval interval)
{
    switch (interval)
    {
        case ReportInterval.Day:
            {
                return ((DateTime.Today - _unixEpoch).TotalNanoseconds, (DateTime.Today.AddDays(1).AddMilliseconds(-1) - _unixEpoch).TotalNanoseconds);
            }
        case ReportInterval.Month:
            {
                var startOfTheWeek = DateTime.Today.AddDays(-(int)DateTime.Today.DayOfWeek + (int)CultureInfo.CurrentCulture.DateTimeFormat.FirstDayOfWeek);
                return ((startOfTheWeek - _unixEpoch).TotalNanoseconds, (startOfTheWeek.AddDays(7) - _unixEpoch).TotalNanoseconds);
            }
        case ReportInterval.Week:
            {
                var today = DateTime.Today;
                var startOfMonth = new DateTime(today.Year, today.Month, 1);
                var endOfMonth = startOfMonth.AddMonths(1);

                return ((startOfMonth - _unixEpoch).TotalNanoseconds, (endOfMonth - _unixEpoch).TotalNanoseconds);
            }
        default:
            {
                throw new NotSupportedException($"{interval} is not supported yet.");
            }
    }
}

(My range expression builder)

private static Exp CreateLutExpression(ReportInterval interval)
{
    var (start, end) = CreateRange(interval);

    // start <= record.metadata.lastUpdateDate >= end
    return Exp.And(Exp.LE(Exp.Val((long)start), Exp.LastUpdate()), Exp.GE(Exp.Val((long)end), Exp.LastUpdate()));
}
  • One thing to notice here is that when I change start with 0 and end with long.MaxValue it returns everything like expected

(How I use it)

 var statement = new Statement()
 {
     SetName = _settings.EventsSet,
     Namespace = _settings.Namespace,
     BinNames = types.GetTypes()
                     .SelectMany(type => ReportCodes.Lookup[type])
                     .Distinct().ToArray(),
 };

 var policy = new QueryPolicy(_client.QueryPolicyDefault)
 {
     filterExp = Exp.Build(Exp.And([CreateLutExpression(interval), .. CreateReportExpression(types)])),
 };
 _client.Query(policy, listener, statement);

(My custom report expression builder that shold fallback to storage access)

private static IEnumerable<Exp> CreateReportExpression(ReportTypes types)
{
    if (types.HasFlag(ReportTypes.UniquePlayersThatActivatedChallenge))
    {
        var expression = Exp.EQ(Exp.Val((int)ActionType.ChallengeActivation), Exp.IntBin(KpiRecord.BinNames.Type));
        yield return expression;
    }
}

So my questions are:

  1. What am I doing wrong with the Lut expression because I don’t see any records being returned for day,week or month?
  2. If I am using AND expression and I put the Lut expression first would that mean that server will try to satisfy the expression left to right, effectively omitting all the storage accesses for the records that are not in the LUT expression criteria (since the LUT expression is left-most condition)?
  3. I noticed that record_metadata_last_update is documented as “nanoseconds (millisecond resolution) from Unix Epoch (1/1/1970) as an integer.” while in index metadata it was described as “Tracks the last writes to the key (Citrusleaf epoch)” and also in this example of getting LUT through UDF it also mentions that LUT is “expressed in milliseconds since the Citrusleaf epoch”. Which one of those three is most correct?
  4. Why I dont seem to see the lut value using aql with “SET RECORD_PRINT_METADATA true” option? I only see ttl and gen and no lut.

(Disclaimer) I read somewhere that even if I don’t update my records they will have their lut set to the creation date so I based my logic around this, I hope this is true.

Thanks in advance!

I have not read your problem statement in detail but I am attaching these slides that may help. Let me know if it resolves the problem, otherwise I will read the details of your situation and get back.

The problem I modeled: Get bin3 if bin1>5 & LUT> 1-1-2022 expr is true: Output: (gen:1),(exp:382148569),(bins:(bin3:v3))

About LUT:

Hi, thanks for the slides but unfortunately they don’t seem to quite answer most of my questions.

I am particularly interested in question number 2. Because if my assumption is wrong then I will create a creationDate bin make a SI of it and query first by SI and then filter by bin values and dropping record metadata optimization.

Regarding Q2: The order of using metadata vs record data values in your expression does not matter. The server will parse the entire expression and is smart enough to see if it can filter out just based on record metadata, then it will not access the record data.
Examples: (if bin1>3 && LUT > x) is same as (if LUT>x && bin1>3). The LUT condition will be checked first by the server. If for a record, LUT is <x, bin1 would not be read regardless of your order of specifying the LUT condition in above two expressions statement examples.

Likewise if you had: (if bin1>3 && LUT>x ) && (device_size > y) … both LUT and device_size are available in record metadata – unless both these conditions are true, bin1 will not be read from record data.

If you had: (bin4 > z) && (if bin1>3 && LUT>x ) and if record LUT was actually <x, then again, the record will not be read.

Re Q4: LUT was added later in the server - in ver 3.8 and we did not think any client would like to read it back. So it is not part of the retrieval data format. So none of the clients or aql can retrieve it as part of record metadata. There is still no legitimate use case for it. Hence, LUT is not retrievable via AQL in the record metadata. But you can access it via UDFs or Expressions.

In Expressions, you have to send the value in nanoseconds though the accuracy is in milliseconds (pad zeroes) from the client APIs. While internally the LUT is stored as timestamp in milliseconds from 1/1/2010 (Citrusleaf Epoch), for Expressions you must send LUT in nanoseconds from Unix Epoch. For UDFs, see How to get LUT (last update time) of a record

So, in your code, please cross check the actual LUT value you are sending for a given date computation vs what linux tells you using the date command, this is why I shared my slide examples.

Thank you for clarification on Q2, I did check the command and what my code is returning as TotalNanoseconds since UnixEpoch and it is all the same… maybe I am doing something else wrong?

Not very familiar with C# - (long)start – casting, can you create a local long value in the function and double check, and pass that? (Since you already checked zero - lastUpdate - max works, good check, rest of the code is good.)

Its very much similar to Java, casting it before that won’t help in fact probably for compiler its the same semantics. Okay then my question is, if I decide to create a secondary index by some bin in the meantime while resolving this issue, what will be the drawbacks? Me having one extra SI in memory, extra bin in the record that I might not need in future, anything else I am missing?

The bin will have to store the int value of the timestamp if you want to do a SI range query. Only int data type SI supports range query. The SI will build a btree (memory cost) but then would be more efficient as the btree entries on range query will directly point to the record PI (Primary Index) - you don’t have to scan all PIs to find candidates. So, in a way, a more performant design. But LUT expressions should work. I will try in Java, time permitting.

BTW, looking at the code - isn’t case for week and month, the code reverse?

        case ReportInterval.Month:
            {
                var startOfTheWeek = DateTime.Today.AddDays(-(int)DateTime.Today.DayOfWeek + (int)CultureInfo.CurrentCulture.DateTimeFormat.FirstDayOfWeek);
                return ((startOfTheWeek - _unixEpoch).TotalNanoseconds, (startOfTheWeek.AddDays(7) - _unixEpoch).TotalNanoseconds);
            }
        case ReportInterval.Week:
            {
                var today = DateTime.Today;
                var startOfMonth = new DateTime(today.Year, today.Month, 1);
                var endOfMonth = startOfMonth.AddMonths(1);

                return ((startOfMonth - _unixEpoch).TotalNanoseconds, (endOfMonth - _unixEpoch).TotalNanoseconds);
            }

I am not sure I understood how SI will be more performance than LUT expression? Or this is not what you’ve meant

It doesn’t matter the order of the case statements as there will always be only 1 valid condition otherwise it wouldnt be valid switch case construct for the compiler

the code in each case - case week has code for month and case month has code for week.

1 Like

Ah this you mean, thanks I’ve fixed it but unfortunately doesn’t change the semantics of the database query I still fail to get records based on this range using lut

I will try to replicate your code in Java and test.

1 Like

@kuskmen I have tested this with Java Client and it works. I did it in Jupyter Notebook. Let me share the code. I get the timestamp-0 and insert key1, get timestamp-1, insert key2, get timestamp-2, insert key3, get timestamp-3. Now, I can use any range of timestamp filters and see if I get back the appropriate keys. Code, cell by cell is below:

Basic client setup:

AerospikeClient client = new AerospikeClient("127.0.0.1", 3000);
Key key1 = new Key("test", "testset", "key1");
Key key2 = new Key("test", "testset", "key2");
Key key3 = new Key("test", "testset", "key3");
WritePolicy wPolicy = new WritePolicy();
Bin b0 = new Bin("b00", Value.get(123)); 
Bin b1 = new Bin("b01", Value.get("abc")); 

Getting date in JN requires a bit of code jugglery - per below, then I insert first key1.

String command = "date +%s%N";
 
try {
    Process process = Runtime.getRuntime().exec(command);
 
    BufferedReader reader = new BufferedReader(
            new InputStreamReader(process.getInputStream()));
    String line;
    while ((line = reader.readLine()) != null) {
        System.out.println(line);
    }
 
    reader.close();
 
} catch (IOException e) {
    e.printStackTrace();
}

b1 = new Bin("b01", Value.get("KEY 1"));
client.put(wPolicy, key1, b0, b1);

System.out.println("Record: "+ client.get(null, key1)); //check record got inserted

Output:

1713460155803847652
Record: (gen:1),(exp:451588155),(bins:(b00:123),(b01:KEY 1))

Likewise key2 generation cell, Output below:

1713460159028548843
Record: (gen:1),(exp:451588159),(bins:(b00:123),(b01:KEY 2))

and key3, output below:

1713460160242915186
Record: (gen:1),(exp:451588160),(bins:(b00:123),(b01:KEY 3))

final timestamp:

1713460163182180368

Rest of the code, pick the range values:

long lowLUT = 1713460159028548843L;
long highLUT = 1713460163182180368L;

Then execute:

//Needed imports
import com.aerospike.client.query.Statement;
import com.aerospike.client.query.Filter;
import com.aerospike.client.policy.QueryPolicy;
import com.aerospike.client.query.RecordSet;

//Using expressions to read record metadata in PI that has record LUT
Expression lutRangeExp = Exp.build(Exp.and( 
       Exp.lt(Exp.val(lowLUT), Exp.lastUpdate()),
       Exp.gt(Exp.val(highLUT), Exp.lastUpdate())
     )
 );

//Run SI query
Statement stmt = new Statement();
stmt.setNamespace("test");
stmt.setSetName("testset");
QueryPolicy qp = new QueryPolicy();
qp.filterExp = lutRangeExp;
RecordSet rs = client.query(qp, stmt);

while(rs.next()){
    Record r = rs.getRecord();
    Key thisKey = rs.getKey();  
    System.out.println(r);
    System.out.println(thisKey);
}

Output:

(gen:1),(exp:451588159),(bins:(b00:123),(b01:KEY 2))
test:testset:null:c10b2c764e604c439039804ec947fa2b18520d1a
(gen:1),(exp:451588160),(bins:(b00:123),(b01:KEY 3))
test:testset:null:10fd8f59adf1833152e439a2e03c19efcb12c145

Another range, output:

long lowLUT = 1713460155803847652L;
long highLUT = 1713460160242915186L;

and the output (I expect key1 and key2 now):

(gen:1),(exp:451588159),(bins:(b00:123),(b01:KEY 2))
test:testset:null:c10b2c764e604c439039804ec947fa2b18520d1a
(gen:1),(exp:451588155),(bins:(b00:123),(b01:KEY 1))
test:testset:null:bf6c1d13e7cd10c5bd022d27e7df170c0bccd6e1

Hi @kuskmen ,

I have reviewed the post and I first wanted to ensure this wasn’t an issue with the C# driver or server.

Based on my understanding of what you are trying to accomplish, I wrote up a C# program in LINQPad (using the Aerospike LINQPad driver) to emulate the behavior you are looking for… It is similar to what @pgupta has provided in his Java sample but using DateTime objects.

After running this sample, the behavior I believe you are looking for seems to work correctly.

After reviewing your provided code snippets, I noticed a possible issue.

In the method CreateRange it is using DateTime.Today (e.g., ((DateTime.Today - _unixEpoch).TotalNanoseconds). DateTime.ToDate returns the local date. To properly calculate a Unix Epoch date/time, this value should be UTC. Maybe you should be using DateTime.Today.ToUniversalTime()?

If this hasn’t resolved the issue, please feel free to provide additional details and I will try to help out…

Thanks

void Main()
{
	var client = ASClient; //use the LINQPad driver connection
	var key1 = new Key("test", "testset", "key1");
	var key2 = new Key("test", "testset", "key2");
	var key3 = new Key("test", "testset", "key3");
	var wPolicy = new WritePolicy()
	{
		sendKey = true
	};
	var b0 = new Bin("b00", Value.Get(123));
	var b1 = new Bin("b01", Value.Get("abc"));

	var epochTime = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);

	var start1Epoch = ((long)DateTime.UtcNow.Subtract(epochTime).TotalNanoseconds).Dump("start Key 1 time");
	Thread.Sleep(1);
	client.Put(wPolicy, key1, b0, b1);	
	var start2Epoch = ((long)DateTime.UtcNow.Subtract(epochTime).TotalNanoseconds).Dump("start Key 2 time");
	Thread.Sleep(1);
	client.Put(wPolicy, key2, b0, b1);
	Thread.Sleep(1);
	var start3Epoch = ((long)DateTime.UtcNow.Subtract(epochTime).TotalNanoseconds).Dump("start Key 3 time");
	client.Put(wPolicy, key3, b0, b1);
	Thread.Sleep(100);
	var endEpoch = ((long)DateTime.UtcNow.Subtract(epochTime).TotalNanoseconds).Dump("end time");

	client.Get(null, key1).Dump("key1", 0);
	client.Get(null, key2).Dump("key2", 0);
	client.Get(null, key3).Dump("key3", 0);

	var lutRangeExp = Exp.Build(Exp.And(
	   								Exp.LT(Exp.Val(start2Epoch), Exp.LastUpdate()),
	   								Exp.GT(Exp.Val(endEpoch), Exp.LastUpdate()))
 						);

	//Run SI query
	var stmt = new Statement()
	{
		Namespace = "test",
		SetName = "testset"
	};

	var qp = new QueryPolicy()
	{
		filterExp = lutRangeExp
	};

	var rs = client.Query(qp, stmt);

	while (rs.Next())
	{
		rs.Record.Dump("Result record");
		rs.Key.Dump("Result Key");
	}

	//Using as a Date (UTC)
	var toDay = ((long)DateTime.Today.ToUniversalTime().Subtract(epochTime).TotalNanoseconds).Dump("ToDay  UTC");
	var tomorrow = ((long)DateTime.Today.AddDays(1).ToUniversalTime().Subtract(epochTime).TotalNanoseconds).Dump("Tomorrow UTC");
	lutRangeExp = Exp.Build(Exp.And(
   								Exp.LE(Exp.Val(toDay), Exp.LastUpdate()),
   								Exp.GE(Exp.Val(tomorrow), Exp.LastUpdate()))
					);

	qp = new QueryPolicy()
	{
		filterExp = lutRangeExp
	};

	rs = client.Query(qp, stmt);

	while (rs.Next())
	{
		rs.Record.Dump("Result record Today and Tomorrow UTC");
		rs.Key.Dump("Result Key Today and Tomorrow UTC");
	}	
}

Thank you very much for your input, I will adjust the code and let you know whenever I can.

@kuskmen Please feel free to contact me, if you have any additional questions!

Thanks

There is nothing else, thanks a lot, this was the problem. I resolved it with converting Today to universal time and went back to LUT expression rather than SI. Thanks a lot!