Missing 80% of GET responses


#1

Our company is planning on transitioning from REDIS to Aerospike, but we are seeing some strange issues with missing get requests (only 35% making it back to the callback function).

Here is the code we are testing with:

var cluster = require('cluster');
var numCPUs = require('os').cpus().length;

if (cluster.isMaster) 
{
	for (var i = 0; i < numCPUs; i++) 
	{
		var worker = cluster.fork();
	}
} 
else 
{

	var start = new Date().getTime();
	var requests = 0;
	var responses = 0;

	var aerospike = require('./node_modules/aerospike');
	var status = aerospike.status;

	var client = aerospike.client({
		hosts: [
			{ addr: '127.0.0.1', port: 3000 } 
		]
	});

	function connect_cb( err, client) {
		if (err.code == status.AEROSPIKE_OK) {
			console.log("Aerospike Connection Success")
		}
	}

	client.connect(connect_cb)

	setInterval(function(){
		for(var i=0; i<50; i++)
		{
			var key = aerospike.key('dexi','toys','floor_'+i);
			requests++;
			client.get(key, function(err, rec, meta) {
				responses++;
				if ( err.code == status.AEROSPIKE_OK ) 
				{
				
				}
				else 
				{
					console.error('Get Error:', err);
				}
			});


		}
	},10);
	
	setInterval(function(){
		for(var i=0; i<50; i++)
		{
			var key = aerospike.key('dexi','toys','floor_'+i);
			var rec = { 
			  uid:    1000,  // integer data stored in bin called "uid"
			  name:   "user_name", // string data stored in bin called "user_name"
			  dob:    { mm: 12, dd: 29, yy: 1995},  // map data stored (msgpack format) in bin called "dob" 
			  friends: [1001, 1002, 1003]
			 };
			 
			var metadata = {
				ttl: 10000,
				gen: 0
			};
			client.put(key, rec, metadata, function(err) {
				switch ( err.code ) {
					case status.AEROSPIKE_OK:
						break;
					
					default:
						console.error("Put Error: " + err.message);
						exitCode = 1;
						break;
				}
			});
		}
	},10);

	setInterval(function(){
		var timeSpent = ( new Date().getTime()) - start;
		console.log(requests, responses,timeSpent);
	},15000);
	
}

Below is the console output we are seeing:

34400 9306 15098
34150 9250 15080
35050 9330 15087
34150 9235 15092
33250 9310 15120
33950 9249 15090
34650 9298 15101
35000 9400 15102
34700 9300 15166
33150 9399 15181
34500 9300 15193
33850 9292 15207
34400 9250 15162
34100 9360 15212
34050 9250 15171
34100 9348 15159
33800 9250 15118
34300 9309 15189
34050 9300 15152
34250 9405 15181

As you can see, on average, for every 35k get requests we send, we are only seeing a small % of them actually come back. Our Aerospike dashboard also reflects this* , as the throughput is reflecting the responses we are getting back.

* @Mnemaudsyne's (Community Manager) note: by 'this', the user means that it's reflecting 9K calls, per a comment in Stackoverflow where this same question was asked by the user: http://stackoverflow.com/questions/29905488/missing-80-of-get-responses/29908585#29908585

#2

I am trying to reproduce the issue in our in house setup. Will give you an update as soon as we resolve this issue.

Thanks


#3

Hi Milonas115,

This is the output I am getting in our inhouse setup.

51500 51500 15029
50750 50750 15025
51850 51849 15030
50950 50950 15034
103050 103000 30034
101250 101250 30030
103300 103250 30037
102200 102200 30045
150900 150346 45034
150600 149985 45037
148250 148132 45046
149900 149350 45045
198450 198450 60036
201950 201950 60039
202150 202100 60046
198500 198500 60057
249000 249000 75036
251350 251350 75040
252500 252459 75051
248300 248275 75062
297000 297000 90037
297850 297849 90049
300250 300250 90051
294900 294899 90064
345500 345450 105041
349200 349150 105057
346200 346199 105064
343200 343200 105079

This is the output I got when we ran in our inhouse setup.

Could you clarify the following details.

  1. Could you tell us the number of requests timed out.

  2. Could you tell us the reason for using setInterval.

  3. Are you observing the same behavior that is missing responses for GET requests without using any setInterval calls in your application.

Because setInterval does not guarantee the order of puts and gets in the code snippet you have given. Here is an article explaining short comings of setIntervals.

Please provide us with the details to debug further.

Thanks

Gayathri.K


#4
  1. The thing is, they aren’t timing out, no errors are being thrown at all on either side.

  2. We are using setinterval to break the event loop into pieces as to not flood Node. We aren’t trying to benchmark the performance of Aerospike, but making sure under different load scenarios, we aren’t running into timeout issues or missing responses.

  3. Yes we have tried simple loops to see what would happen (same issue). We even tested scenarios that we would put, and then from the put callback do a get, and track how number of requests into put and the responses from the get callback, and they were also off by about 70%. The only time it matched up close to 100% of the time was when we would really slow down the number of puts/gets (2-3k every 10 seconds).


#5

There is a way to enable logs when creating the client object.

Could you enable the log and send me the output. Is it possible to provide access to your environment for debugging.

Thanks

Gayathri.K