Single bin optimization


#1

The goal is to perform 3 operations in one transaction on a single bin optimized namespace.

  1. Create the record if the key doesn’t exist, set the single bin value to 1.
  2. If the key exists, increment single bin value atomically.
  3. return the record with the single bin value that was incremented.

If asynchronous client is used, is it safe to perform the multiple operation Write, increment, read


#2

Use a record udf?


#3

Is Lua the ideal solution? and faster than a write operation with policy set to AS_POLICY_EXISTS_CREATE, followed by multiple operations increment and read?


#4

Yes, it’s safe to use atomic incr operations in async client.

void example()
{
  as_operations ops;
  as_operations_inita(&ops, 2);
  as_operations_add_incr(&ops, "a", 1); 
  as_operations_add_read(&ops, "a");

  as_error e;
  aerospike_key_operate_async(&as, &e, NULL, &key, &ops, operate_callback, NULL, NULL, NULL);
  as_operations_destroy(&ops);
}

void
operate_callback(as_error* err, as_record* rec, void* udata, as_event_loop* event_loop)
{
  if (err) {
    printf("Error: %d - %s\n", err->code, err->message);
    return;
  }

  int64_t val = as_record_get_int64(rec, "a", 0);
  printf("Return val: %lld\n", val);
}

#5

@Brian If the record doesn’t already exist, does “add_incr” create it and set it to 1? And how do I set the ttl for the record at creation?


#6

Yes. If record does not exist, initial value is the increment value (in this case 1).

Example with ttl:

  as_operations ops;
  as_operations_inita(&ops, 2);
  as_operations_add_incr(&ops, "a", 1); 
  as_operations_add_read(&ops, "a");
  ops.ttl = 1000000;

  as_error e;
  aerospike_key_operate_async(&as, &e, NULL, &key, &ops, operate_callback, NULL, NULL, NULL);
  as_operations_destroy(&ops);

#7

@Brian Another question, Is there an easy way to see how many times a record has been read in the past x seconds?


#8

The only way I can think of is to store read count in another bin on the record and increment via client operation or server UDF. This would not work for single bin namespaces nor for “x seconds” range.


#9

Is there a reasonable maximum number of reads that you expect in the last x number for seconds? What is that number? For example, if it is something like max 20 reads in last 10 seconds - then I can think of a way to do it - ie the method has to fit within the record size limitation. Is the record stored on persistent storage (SSD/HDD) - so max 1Mb or write-block-size whichever is smaller or storage is in RAM - then record can be much much larger - ? The way I would implement it is to put a limited list in a bin of of each timestamp in seconds when record was read and use Operate() to update the list and do the read.

So, another bin in the record (hence wont work with single bin namespace) - which has a list of timestamps prepended and then trimmed to a max size allowable by record size limit. Count of reads within x seconds can be obtained using the list values.


#10

@pgupta The expected number of reads is quite high. I have resorted to using UDF. The implementation is not done, but the flow in mind was to:

  1. Set a ttl to the record as x second.
  2. Update the single bin increment when its read.
  3. Set ttl back to x if it reaches 0.

I don’t think I can get the correct number of reads in the past x seconds this way, but it is enough for my use case. Thanks


#11

I don’t quite understand what that gets you. Plus if the number of reads is high, udf concept will not work. Each udf module gets it own state cache in memory. Each module can consume up to 128 such caches at any given time on a node. So if your concurrent reads to the cluster utilizing this UDF exceed 128 per node, udf performance will degrade once you are already using the 128 concurrent invocations per node.