ev2citrusleaf_get_many_digest with BLOB object error

Hi, we are using Aerospike C libevent2 API in our project. Everything workes ok when we request data using ev2citrusleaf_get function, but we get corrupted data when we try to use ev2citrusleaf_get_many_digest call. The problem occurs only with BLOB objects, string objects are ok.

After some investigation I’ve find the following lines in C libevent SDK source code:

file ev2citrusleaf.c

int
set_object(cl_msg_op *op, ev2citrusleaf_object *obj)
{
   obj->type = (ev2citrusleaf_type)op->particle_type;
   
   switch (op->particle_type) {
      case CL_PARTICLE_TYPE_NULL:
         obj->size = 0;
         obj->free = 0;
         break;
         
      case CL_PARTICLE_TYPE_INTEGER:
         obj->size = 0; // unused in integer case
         obj->free = 0;
         return( op_to_value_int(cl_msg_op_get_value_p(op), cl_msg_op_get_value_sz(op),&(obj->u.i64)) );

      // regrettably, we have to add the null. I hate null termination.
      case CL_PARTICLE_TYPE_STRING:
         obj->size = cl_msg_op_get_value_sz(op);
         obj->free = obj->u.str = (char*)malloc(obj->size+1);
         if (obj->free == 0) return(-1);
         memcpy(obj->u.str, cl_msg_op_get_value_p(op), obj->size);
         obj->u.str[obj->size] = 0;
         break;
      // 
      case CL_PARTICLE_TYPE_BLOB:
      case CL_PARTICLE_TYPE_JAVA_BLOB:
      case CL_PARTICLE_TYPE_CSHARP_BLOB:
      case CL_PARTICLE_TYPE_PYTHON_BLOB:
      case CL_PARTICLE_TYPE_RUBY_BLOB:

         obj->size = cl_msg_op_get_value_sz(op);
         obj->u.blob = cl_msg_op_get_value_p(op);
         obj->free = 0;
         break;
         
      default:
         cf_warn("parse: internal error: received unknown object type %d",op->particle_type);
         return(-1);
   }
   return(0);
}   

We can see that string and BLOB objects are processed differently. Strings are copied and blobs aren’t. As I understand, if we use ev2citrusleaf_get_many_digest function then received objects are stored in temporary buffers. This buffers can be freed, and we get pointer to uninitialized memory. Is it a bug or I’m missing something?

Sandro,

You are absolutely correct, there is a bug in ev2citrusleaf_get_many_digest() when the record data type is BLOB. It is exactly as you described – the blob object keeps a pointer directly into a “read buffer” filled by the stream from the server, and for these batch operations this buffer is freed before the callback is made to the app.

If you want to do your own quick patch for now, you can modify the blob object handler to allocate and copy data, similar to the string object handler.

However we at Aerospike intend to release a new version with a more optimized fix. We will have the batch job hang on to the read buffers (one per node) until after the callback. Therefore we will avoid allocation and copying for blobs.

(By the way, note that the general rule for bin data in callbacks is that the data is only valid during the scope of the callback, and ev2citrusleaf_bins_free() must be called within the callback after consuming bin data. To keep the data beyond this scope, it must be copied. Of course you will notice it’s possible to “cheat” for strings by not calling ev2citrusleaf_bins_free() in the callback and doing so later, but that’s not recommended practice.)

This bug is now fixed in release 2.1.20, available on the website.