Handle time based queries

Thrinath_Dosapati · March 27, 2018, 3:27pm

We came across a case, where we want to retrieve data from a time series. Let say we have time based data : [“t1-t2” : {data1}, “t2-t3” : {data2}, “t3-t4”:{dat3}]

With the above kind of data, we would want to look up exact data w.r.t time. For example, for a given time t1.5, the data has to come as data1, and for t2.6 it should come as data2.

To solve the above problem, we are planning to store the data in a sorted map in aerospike as mentioned below {“t1”:{data1}, “t2”:{dat2}, “t3”: {data3}}

When a client asks for t1.5, we must return data1. To achieve this, we implemented a UDF at the server level to do a binary search for the nearest and lowest value for the given input (i.e t1.5), which will return t1 's value ,i.e data1.

Is there a better way of achieving the same, as it incurs cost at server level for every request. Even UDF to do a binary search requires loading all the data in memory, can we avoid it?

pgupta · March 27, 2018, 6:38pm

Thinking aloud… Storing t1-t2, t2-t3 is redundant on t2. Just store t1, t2 is inferred from next key:value. { t1:data, t2:data, …} - store key sorted You must know max difference between any ‘t1’ and ‘t2’ Build secondary index on MAPKEY and type numeric (this essentially does the bulk of the sort work for you upfront in the RAM) Search for records where t between t-maxdiff and t+maxdiff ==> a set of few records and pass these to your UDF. Invoke UDF on these few records subset to return the data. This will be a very simple UDF. Note: A particular UDFs invocations are efficient only up to 128 concurrent executions per node at any give time. Beyond 128 invocations of the same UDF on a given node, the performance will degrade.

sunil · March 30, 2018, 7:34am

One more suggestion…

Seems you have a fixed timeslab (t1-t2) but storing the lower end (t1) as map key. If you store the upper end (t2) as the may key, you should be able to use our getByKeyRange() operation. If you are looking for t1.5, you can use getByKeyRange() and pass t1.5 as the lower bound. You can pass a conservative upper-bound to limit the number of returned values. Otherwise you can leave it as NULL. If you pass NULL, you will get everything above t1.5 In either case, the application can consume the first value and discard the result set.

You can store lower-bound of the timeslab also but the logic will become little bit more complex. Instead of passing your desired value as lower-bound, you have to pass it as upper-bound. Similar to the above, you can pass a conservative lower-bound or NULL. When you get the results, you need to skip all values except the last value.

Topic		Replies	Views
UDF to query/visualize data over time User Defined Functions (UDF)	0	929	May 18, 2020
Accessing Forwarding Records From UDF User Defined Functions (UDF)	3	2026	January 3, 2015
Is it possible to do database queries inside UDF function? User Defined Functions (UDF)	9	3256	July 13, 2018
Querying data based on MAPKEYS index and applying filters in UDF secondary , udf , stream , map	6	2340	August 20, 2017
UDF and secondary indexes User Defined Functions (UDF)	2	1861	December 9, 2015

Handle time based queries

Related topics