Accessing Forwarding Records From UDF

kporter · December 31, 2014, 12:33am

Hi James,

Transferring your question from topic: How does Aerospike Client Find a Node.

kporter · December 31, 2014, 12:54am

customize the key hashing function such that we can have both Ka and Kb being stored on the same node

write an UDF to auto translate Kb → Ka → value

Unfortunately at this time record UDFs can only access a single record at a time.

Perhaps with additional information about your use case we could suggest a solution?

JamesWong · January 2, 2015, 3:22pm

For our use case, we have key-value data chain like the following,

KeyA → KeyB
KetB → value

So when the application receive KeyB, we can retrieve value in 1 query. However, if we are given KeyA, we’ll need 2 queries in order to get to the value (i. KeyA → KeyB, ii, KeyB → value).

We are in a highly optimized environment that requires single digit millisecond response time. The above data retrieval is just part of the work. In our own profiling, network latency is one of the dominating factors and ideally the data retrieval can be done in one single query even if we are given KeyA. That’s why we are looking for a fast key-value store that allows some kind of server side processing.

So we are imagining if we can build our own server side process sitting in the same db shard node which will query the db (1 or 2 queries depending on KeyA or KeyB we’ve received). However, to do that, we’ll need to

know which shard node to call
have both the mappings of KeyA → KeyB and KeyB → value be stored in the same shard node

We do have the leeway to manipulate the key pattern if that can allow us to make 2. happen. And so our questions are

Does AeroSpike allows us to customize the partition hashing function?
How can the UDF help?

Other suggestions would be appreciated as well.

Thanks, – james

PS Thanks to @kporter for following up

wchu · January 3, 2015, 1:23am

A good portion of Aerospike use cases, which require single-digit ms response time, does a 2 trip look up, by storing keyA->keyB translation in a data-in-memory namespace, and keyB->value lookup in a second (SSD) namespace.

One possible way is to store keyA as a bin value on the keyB record, and create a secondary index on that bin. Then look up can be done via the primary key (keyB), or the secondary key (keyA, which will give direct access to keyB). The secondary index query will need to be made against all nodes in the cluster (since the record can be on any node).

I would recommend running some performance tests using your choice of hardware and network environment to get best predictability for the options above.

Additional information on which node the record for a key resides:

A key is hashed in to a 20 byte digest using RIPEMD160. 12 bits of the digest is used to determine the partition. See as_partition_getid() in aerospike-server/datamodel.h at master · aerospike/aerospike-server · GitHub
For 2 records to belong to the same partition, those 12 bits will have to be the same. There is no customization of the partition hashing function to guarantee this.
RecordUDF calls cannot access record other than itself. So even if keyA and keyB are in the same partition, there is no method to access keyB while in the context of keyA.

Topic		Replies	Views
Is it possible to do database queries inside UDF function? User Defined Functions (UDF)	9	3257	July 13, 2018
Can I send a CVS file to a UDF to insert its contents into the DB? How Aerospike Works	2	2474	December 7, 2015
Record UDF Performance Questions - Is this the right use case?	3	207	March 20, 2025
Add new records in context of UDF User Defined Functions (UDF)	4	1187	April 2, 2019
Distinct bin udf User Defined Functions (UDF) udf , python	4	2790	October 20, 2015

Accessing Forwarding Records From UDF

Related topics