Lua bytes speed an memory efficiency

udf

#1

In one of our applications we are using a bytes array in LUA to manage an array of key,expire pairs. Pairs can be modified and can expire.

I’m wondering if a developer can give me some insight into which would be the best strategy here.

One thing we do is always set the expire of the whole record to the longest expire value from our key,expire pairs. This way when the oldest one expires we can let aerospike delete the whole record.

But one issue is that when certain pairs are update frequently with new expire values some of the older pairs that have long expired stay in the array.

One way to handle this would be for every modify operation to loop through the whole array and only copy the pairs that haven’t expired yet to a new bytes array. This is the strategy we are using now. This is a simplified version of the code:

function add(r, cid, per, now)
  local found = false
  local b     = bytes(0)

  if not aerospike:exists(r) then
    aerospike:create(r)
    record.set_ttl(r, per)
  else
    local ttl = record.ttl(r)

    -- If the whole record is expired don't use it at all.
    -- 2147483648 is because we are using an old version of aerospike which had this bug.
    if ttl == 0 or ttl > 2147483648 then
      record.set_ttl(r, per)
    else
      local c     = r["f"]
      local csize = bytes.size(c)

      -- Loop over the old data, copy what hasn't expired yet.
      for i=1,csize,8 do
        local expire = bytes.get_int32_le(c, i+4)

        if expire > now then
          if bytes.get_int32_le(c, i) == cid then
            found = true

            bytes.append_int32_le(b, cid)
            bytes.append_int32_le(b, now + per)
          else
            bytes.append_bytes(b, bytes.get_bytes(c, i, 8), 8)
          end
        end
      end

      if ttl < per then
        record.set_ttl(r, per)
      end
    end
  end

  if found == false then
    bytes.append_int32_le(b, cid)
    bytes.append_int32_le(b, now + per)
  end

  r["f"] = b
  aerospike:update(r)

  return 0
end

My question is; is this the most optimal way to do this in aerospike, or are some optimizations possible?

Looking at as_bytes_ensure I’m seeing that the bytes array is grown as needed. I haven’t looked into the memory management code yet but wouldn’t it better to grow the array with a 1.5 growth factor?

I don’t know how after the UDF the bytes are actually committed to the table. Would allocating a new bytes array every time cause a lot of fragmentation?

Anyone have any other suggestions on how to improve this?


#2

It is msgpacked and stored inside bin. And ofcourse unpacked when you read it to modify it.

Depends on how the usage is but the recommendation is to overallocate at the initialization. This allows the system not to attempt the reallocation of many sizes which would really cause the fragmentation. e.g if you know the bytes you use is bounded by say 1000 bytes. Then you can always create it that big at initialization. New bytes everytime won’t cause fragmentation if you use same size allocation everytime.

In general calling back into server multiple times has extra costs. A little better way of doing this would be convert bytes into string and do entire manipulation and check on the string inside the Lua and store it back either as string or a byte back when you are done.That would reduce number of calls back to server code from lua to 2, one for reading the bin and one while writing it back.

HTH

– Raj


#3

I did some benchmarking. It’s very hard to do accurate benchmarking of UDF code with aerospike. In the end I ended up just calling the functions a whole bunch of time inside the UDF. Of course this doesn’t really take the committing to the table in account but that should always be the same speed.

My testing code can be found at: https://gist.github.com/erikdubbelboer/5dbecf2b8315baf82ee8

The pure Lua string version seemed to be a little bit slower with writing than using the bytes functions.

bench_write_lua: 3.466228942s
bench_write_as: 2.863101461s

While on average the Lua read function was a bit faster than the bytes functions.

bench_read_lua: 2.816864679s
bench_read_as: 3.329490547s

I also tried to use the PutUint32 function directly on a bytes object instead of a Lua table. This resulted in even slower code:

bench_write_lua_on_bytes: 9.642002433s
bench_read_lua_on_bytes: 12.717945593s

#4

I would expect it to be other way around. I see you do many invocation of function in a single call. How many such UDF call did you make. What is your observation about memory utilization of the process. I am looking out for fragmentations.

– R


#5

I did only 100 UDF calls for this benchmark.

I didn’t really look at memory utilization, but since this test version preallocates the whole bytes and you can’t really preallocate Lua strings I would expect the bytes version to be better in this case as well.

What would you suggest as method to benchmark CPU and memory usage for UDF with aerospike? Is there something better than just running the function and looking top and how many I can do per second?