Segmentation fault in client only with more than one node in server


#1

We are evaluating Aerospike for an IOT type application. Various clients insert data into the database, other clients see what data has changed, process the data and insert updated information. I’m evaluating whether Aerospike is suitable for our use (clients update the database at rates of 100 ms continuously). I wrote a simple client: Create a set of data records and insert 10 records. Create a second set of groups for the data records. In a forever loop I get the container record (for testing there is a single container with a known name). I then get the data record names assigned to the container. The names are used in a batch get. Within the batch get I process the data records.

If I run this simple client against an Aerospike cluster with a single node in the cluster everything works OK. If I start a second Aerospike server node the cluster now recognizes that there are 2 nodes in the cluster.

My client now generates a segmentation fault on the very first batch get request. If I stop the second server node and re-run, the client executes with no problems. Again, the segmentation fault is on the first batch get so it’s not an issue of looping.

Is there something special that needs to be done by a client?

as_error err;
as_key key;
as_record *rec = NULL;
as_batch *batch = NULL;
as_key_init_str(&key, "test", "Loops", loopName);
as_record_init (&timeUpdate, 4);

// We run forever
while (true)
{
	// First we have to get the loop
		//key = as_key_new("test", "Loops", loopName);

	if ( aerospike_key_get(&aeroServer, &err, NULL, &key, &rec) != AEROSPIKE_OK )
	{
		fprintf(stderr, "error(%d) %s at [%s:%d]", err.code, err.message, err.file, err.line);
		return 2;
	}
	// Get the loop name as stored
	recordLoopName = as_record_get_str(rec, "name");

	// Get the loop times
	nextTime.tv_sec = as_record_get_int64(rec, "nextTime", 0);
	nextTime.tv_nsec = as_record_get_int64(rec, "nextTimeNsec", 0);
	lastTime.tv_sec = as_record_get_int64(rec, "lastTime", 0);
	lastTime.tv_nsec = as_record_get_int64(rec, "lastTimeNsec", 0);

	// Get the total number of blocks
	loopBlockAs = as_record_get_integer(rec, "NumberBlocks");
	totalBlocks = as_integer_get(loopBlockAs);
	// Now for the block names. It is stored as a blob
	blockBytes = as_record_get_bytes(rec, "blocks");
	rawBuffer = (char *)as_bytes_get(blockBytes);
	blockSize = as_bytes_size(blockBytes);
	// We have the internal blob buffer and the blob size, copy to properly casted array
	memcpy (gblockNames, rawBuffer, blockSize);

	// Now we need to get the blocks
	// Batch operations
	batch = as_batch_new (totalBlocks);
	// Add the keys
	for (int i=0; i < totalBlocks; i++)
	{
		//as_key_init (as_batch_keyat(&batch, i), "test", "Blocks", &gblockNames[i][0]);
		as_key_init (as_batch_keyat(batch, i), "test", "Blocks", &gblockNames[i][0]);
		//fprintf(stdout, "Adding batch key %i:%s Total:%i\n", i, &gblockNames[i][0],totalBlocks);
		//fflush (stdout);
	}

	// Now issue the get which invokes a callback
	if ( aerospike_batch_get(&aeroServer, &err, &batchPolicy, batch, getblocksCallback, NULL) != AEROSPIKE_OK )
	{
		fprintf(stderr, "error(%d) %s at [%s:%d]", err.code, err.message, err.file, err.line);
		return 2;
	}

#2

This is the top of stack: #1 0x00007ffff7b6ee64 in as_batch_parse_records (err=0x7fffffffa6c0, buf=, size=, task=0x7fffffffb5b0) at src/main/aerospike/aerospike_batch.c:159 159 src/main/aerospike/aerospike_batch.c: No such file or directory. (gdb) up #2 0x00007ffff7b6f266 in as_batch_parse (err=0x7fffffffa6c0, fd=6, deadline_ms=3282148, udata=0x7fffffffb5b0) at src/main/aerospike/aerospike_batch.c:225 225 in src/main/aerospike/aerospike_batch.c


#3

This is a recent regression introduced in 3.1.18 & 3.1.19.

We have put out a fix with 3.1.20. Please try and confirm.

Thanks.


#4

@ncamino,

The link to download v3.1.20 of our C client is here.


#5

Thanks. I downloaded version 20 and rebuilt. The problem is fixed.