Accessing Forwarding Records From UDF


#1

Hi James,

Transferring your question from topic: How does Aerospike Client Find a Node.


#2

@JamesWong,

  • customize the key hashing function such that we can have both Ka and Kb being stored on the same node
  • write an UDF to auto translate Kb -> Ka -> value

Unfortunately at this time record UDFs can only access a single record at a time.

Perhaps with additional information about your use case we could suggest a solution?


#3

For our use case, we have key-value data chain like the following,

  1. KeyA -> KeyB
  2. KetB -> value

So when the application receive KeyB, we can retrieve value in 1 query. However, if we are given KeyA, we’ll need 2 queries in order to get to the value (i. KeyA -> KeyB, ii, KeyB -> value).

We are in a highly optimized environment that requires single digit millisecond response time. The above data retrieval is just part of the work. In our own profiling, network latency is one of the dominating factors and ideally the data retrieval can be done in one single query even if we are given KeyA. That’s why we are looking for a fast key-value store that allows some kind of server side processing.

So we are imagining if we can build our own server side process sitting in the same db shard node which will query the db (1 or 2 queries depending on KeyA or KeyB we’ve received). However, to do that, we’ll need to

  1. know which shard node to call
  2. have both the mappings of KeyA -> KeyB and KeyB -> value be stored in the same shard node

We do have the leeway to manipulate the key pattern if that can allow us to make 2. happen. And so our questions are

  • Does AeroSpike allows us to customize the partition hashing function?
  • How can the UDF help?

Other suggestions would be appreciated as well.

Thanks, – james

PS Thanks to @kporter for following up


#4

A good portion of Aerospike use cases, which require single-digit ms response time, does a 2 trip look up, by storing keyA->keyB translation in a data-in-memory namespace, and keyB->value lookup in a second (SSD) namespace.

One possible way is to store keyA as a bin value on the keyB record, and create a secondary index on that bin. Then look up can be done via the primary key (keyB), or the secondary key (keyA, which will give direct access to keyB). The secondary index query will need to be made against all nodes in the cluster (since the record can be on any node).

I would recommend running some performance tests using your choice of hardware and network environment to get best predictability for the options above.

Additional information on which node the record for a key resides:

  • A key is hashed in to a 20 byte digest using RIPEMD160. 12 bits of the digest is used to determine the partition. See as_partition_getid() in https://github.com/aerospike/aerospike-server/blob/master/as/include/base/datamodel.h
  • For 2 records to belong to the same partition, those 12 bits will have to be the same. There is no customization of the partition hashing function to guarantee this.
  • RecordUDF calls cannot access record other than itself. So even if keyA and keyB are in the same partition, there is no method to access keyB while in the context of keyA.