Record UDF Performance Questions - Is this the right use case?

We’re using Aerospike to store user data, each record has key as user_id and value is a record with multiple bins.

One of the bins with name Aud is of type list that stores the audience segments that this user belongs to. Aud bin has substantial data (currently in KBs) and it could grow in future.

For serving an impression to a user, we

  1. Fetch the user’s record i.e all bins for that user to AdServer
  2. Using the segments in the Aud bin, we evaluate the Deals applicable to the user i.e user_id => deals
  3. Do further processing

Evaluating

user_id => deals

mapping is expensive because

  1. Aud bin has heavy payload and so incurs Network Cost to fetch that data.
  2. The algorithm requires a lot of compute to evaluate the mapping i.e Compute Cost at AdServer.
  3. Both 1 and 2 are being done in the hot-path/online i.e during impresion serving path making it a hotspot.

Hence we think that this is a good candidate for offline evaluation i.e we would create a new bin called Deals which would be “somehow” populated i.e we would evaluate the user_id => deals offline and put it in the new bin Deals. To ensure that the mapping is fresh, we need a Deals TTL as well which would hold the expiry of this mapping.

So the New Schema would look like this

Where bins Deals and Deals TTL are the new bins being populated.

We are planning that after evaluating the mapping, AdServer would write the mapping to Deals bin and update the Deals TTL bin to a future time, say 1 hour in future from the present time. The Deals TTL bin acts like an expiry time for Deals bin. So this optimizes 2(see above)

To ensure that we optimize the network cost i.e fetch Aud bin only if Deals are not expired, we are thinking of using UDF. The UDF would implement the following pseudo-code

Read Deals TTL bin to get expiry time

if (TTL Expired) { 
  send map {
            "Aud":[Audience Segments],
            "Bin(3)":Data,
            ...,
            "Bin(N)":Data
           }
} else {
  send map {
            "Deals":[Applicable Deals],
            "Bin(3)":Data,
            ...,
            "Bin(N)":Data
           }
}

Note that either Aud or Deals bin is being sent conditioned on Deals TTL and the current time, remaining bins are being sent “as is”.

Questions :

  1. Will UDFs scale for this use-case. Looks like the perfect tool to me. We aren’t using any UDFs currently. We are using the C client library in AdServer and one another module (not part of AdServer) also uses Golang client library. Is getting a record as a map using UDF more expensive than simply fetching the bins? If yes, why?

  2. Can I do this using Aerospike Operations Expressions instead of UDF?

  3. How can we reliably measure the performance impact if any on a test cluster? I believe Aerospike does some level of sandboxig of these UDFs for safety, will that have adverse affect on performance? By impact we mean the additional overhead in terms of CPU/memory and latency imposed on Aerospike due to this UDF vis-a-vis the existing approach and not just the client side latency (which we would be able to measure).

  4. Consider this alternate approach → What if we write a background UDF which SCANs the entire set and implements the logic to deduce the

user => deal

mapping? Is UDF the right tool for such a job which would be scheduled say every 1 hour. Note that we have billions of records for users.

Hi Kartik,

Thanks for reaching out, and it’s a good question. UDFs are good – in some cases. They’re written in Lua and hence have a full programming language which allows you to do pretty much anything you want with a record. So you can iterate over all the values in a list and sum them or filter them for example. They’re very flexible and very powerful.

However, there are drawbacks to UDFs too. These include:

  • They’re written in Lua which is being invoked from the server code (written in C). So there’s an execution context the server needs to create to be able to “shell out” to Lua. This context will JIT compile the Lua script and execute it. All this costs time.
  • Aerospike tries to optimize this by caching execution contexts. Each Lua script can have up to 128 contexts cached, after that each call will create a new execution context and then throw it away when execution is finished.

Expressions on the other hand are not as flexible (there currently are no looping constructs for example) but are executed in native C code with no need to create an execution context. Hence they are typically significantly faster than UDFs.

My typical rule of thumb is: If I can do it in an Expression, that is my first choice. If not, then I would fall back to using a UDF. Especially where speed is critical, such as in AdServing.

Background UDFs typically make more sense, so your scanning of user => deal might make sense as a UDF. However, depending on the number of deals, it might make sense as an Expression too.

For your AdServing part I would definitely use Expressions over UDFs. There will be substantial performance benefits of doing so. One example of doing this would be:

public class DealsSample {
    
    private static Record getData(IAerospikeClient client, Key key, long time) {
        Exp dealsValid = Exp.ge(Exp.val(time), Exp.intBin("DealsTTL"));
        Expression exp = Exp.build(
                Exp.cond(
                        dealsValid,
                        Exp.listBin("Deals"),
                        Exp.listBin("Aud")
                )
            );
        return client.operate(null, key,
                ExpOperation.read("segments", exp, ExpReadFlags.DEFAULT),
                ExpOperation.read("dealsValid", Exp.build(dealsValid), ExpReadFlags.DEFAULT),
                Operation.get("Bin3")
                );
    }
    
    public static void main(String[] args) {
        try (IAerospikeClient client = new AerospikeClient("172.17.0.2", 3100)) {
            long now = new Date().getTime();
            Key key = new Key("test", "testSet", 1);
            client.put(null, key, 
                    new Bin("Aud", Arrays.asList("A", "B", "C", "D", "E")), 
                    new Bin("Deals", Arrays.asList("A", "C", "D")), 
                    new Bin("DealsTTL", now),
                    new Bin("Bin3", "stuff"));
            
            // This has the time we're fetching the record at before the TTL of the deals, should have a deals bin
            System.out.println("Record with a valid TTL");
            System.out.println(getData(client, key, now-10000L));

            System.out.println("Record with an valid TTL");
            System.out.println(getData(client, key, now+10000L));
        }
    }
}

(Sorry, this code is Java rather than C, but the same concepts will map across)

This code will return 2 pseudo bins – segments and dealsValid. Segments will either be Aud or Deals and dealsValid will return true if the Segments is the deals or false if it’s the Aud. The output looks like:

Record with a valid TTL
(gen:5),(exp:0),(bins:(segments:[A, B, C, D, E]),(dealsValid:false),(Bin3:stuff))
Record with an valid TTL
(gen:5),(exp:0),(bins:(segments:[A, C, D]),(dealsValid:true),(Bin3:stuff))

One question though – This will give all deals the same TTL. It’s all or nothing. Surely you have different deals with different TTLs? You could store the deals in a map with the deal as the map key and the TTL as the map value, then use some of the MapOperation and MapExp to filter out the valid deals and throw the other ones away. That way you wouldn’t need to refresh every hour necessarily.

1 Like

Hi @Tim_Faulkes

Can’t thank you enough for the detailed answer. It was extremely helpful. I am on vacation right now and hence didn’t check your reply. We already did some very simple benchmarking for UDFs and measured client side latency. The benchmark details -

  1. Both client and server are on same machine. Language - C. Server version is 5.6.0.7

  2. Client tries to fetch 100K records and compares the latency of

    A) `aerospike_key_get`
         vs
    B) `aerospike_key_apply`(which simply invokes a lua function registered at the server)
    

So it is read-only benchmark(no writes)

  1. Average records size is 4.8KB
  2. Both A and B fetch the entire record. With B we invoke a lua function which fetches all the bins of the records and send back a map. It doesn’t have an if/else conditional logic in it. Just get all bins, put in a map and send back.
  3. The client doesn’t process the record(i.e parse the bins etc) and simply records how much time it took to fetch a total of 100K records. The benchmark can also be run in multi-threaded configuration.

The results - B takes about 2x time compared to A.

What I’ll do next - use operations expressions instead(and change the benchmark to have some if/else conditional logic).

Few questions though -(just out of curiosity)

  1. Do you think the above benchmark is “sensible” or over-simplified/biased?
  2. Does aerospike_key_get also enjoys more parallelism(since the server knows it is a “read-only” request) compared to aerospike_key_apply(which I believe would certainly require a write lock on the record, since server won’t know whether lua function is going to do a write operation)

Thanks for your patience

Regards

kartik