Memory consumption problem on nodejs aerospike client application


#1

I’m seeing weird behaviour with regards to memory consumption from the nodejs aerospike client. I created a small dummy snippet that will populate a with a fixed number of keys that just increment utilizing a for loop. The code that can reproduce the problem is the following:

for(count=0;count<40000000;count++) {
    var key = aerospike.key('test', 'demo2', count);

    client.put(key, object, function(err){
      if ( err.code != aerospike.status.AEROSPIKE_OK ) {
         console.log("error: %s", err.message);
      }
    });
}

The problem is that after about 3-4 minutes, the program crashes:

$ time node async-write.js
Killed

real    3m38.687s
user    4m59.115s
sys     1m42.909s

The problem is related with the memory consumption of the nodejs process (node):

Killed process 21075 (node) total-vm:48488124kB, anon-rss:47528396kB, file-rss:616kB

Similar problems were met while I ran the benchmark utilities you have available under aerospike’s benchmark/ code on the nodejs client distribution. I also noticed there that you have a utility class that tracks memory consumption.

Is the above case something not supposed to be done? Is this high memory consumption expected? If yes, how is a developer of a long lived application supposed to do in order to avoid such memory consumption?


#2

Hi dlhero,

A single Node.js client means your app only has one thread. Meaning, aside from I/O, only one task/event is processed by Node’s event loop at any given moment. So even if you are running Node.js on a multi-core machine, you will not get any parallelism in terms of actual processing - all events will be processed one at a time. Therefore, Node.js is not good for CPU intensive tasks; it is great for I/O bound tasks.

Having said that:

  • By default, v8 has a memory limit of 1gb on 64-bit systems and 512mb on 32-bit systems. You can alter this limit by setting –max-old-space-size to a maximum of ~1gb for 32-bit or ~1.7gb for 64-bit, but this does not come recommended. The better approach would be to split single process into several workers.

  • Instead of inlining the callback function, try passing a reference. Like so:

function seedObjects()  {
  var start = 1;
  var end = 10000000;

  for (var i = start; i <= end; i++) {
    var record = {uid: 'user'+i};
    var key = aerospike.key('test','test',i);
    client.put(key, record, cb);
  }
}

cb = function (err,rec,meta) {
  if ( err.code === aerospike.status.AEROSPIKE_OK ) {
    //success
  } else {
    //failure
}

seedObjects();

Running the above code on my Mac successfully created 10 million records in the Aerospike Database running on a VM. FYI, here are my Mac specs: 2.6 GHz Intel Core i7; 16GB of RAM. This surfaces the obvious that how a Node.js client might perform depends on the resources available to it. I do want to point out though that in production environment you’d normally have more than one client generating requests.

I hope this helps.


#3
  1. I am aware that this is highly inefficient but it is meant as a test in order to count the maximum number of objects aerospike can handle in my scenario.

  2. I had the impression that the aerospike.key() call is ‘async’ since it returns before it’s completed and runs the callback function on finish. For example the above loop before crashing manages to reach ~32K puts/sec.

My target for this test was just to have a very fast loop that populates the namespace with dummy keys.

As you can see, I have included a syslog message from my system and it shows that the nodejs process goes way beyond that limit. the Out Of Memory killer repead the process at total-vm:48488124kB, anon-rss:47528396kB which is WAY beyond 1.7Gbyte.

I managed to create many more objects than that in my setup, but the problem is that the nodejs process was always increasing in memory although I do not store or copy anything anywhere in my code.

I tried the async module but the way it works is weird since in most patterns you defined a function which is passed an invisible callback that you are supposed to call at the end of your function. For example. this is one of their test cases:

    async.whilst(
            function () {
                call_order.push(['test', count]);
                return (count < 5);
            },
            function (cb) {
                call_order.push(['iterator', count]);
                count++;
                cb();
            },
            function (err) {
                test.same(call_order, [
                    ['test', 0],
                    ['iterator', 0], ['test', 1],
                    ['iterator', 1], ['test', 2],
                    ['iterator', 2], ['test', 3],
                    ['iterator', 3], ['test', 4],
                    ['iterator', 4], ['test', 5],
                ]);
                test.equals(count, 5);
                test.done();
            }
        );
    };

Applying the above paradigm essentially killed node again because the second function calls cb() at the end which again calls the 2nd function recursively and that leads nodejs to stack space exhaustion. Applying a remedy where cb() is called at the next tick of the eventloop essentially limits the number of puts I can do () to less <1000op/s which is too slow.


#4

Hi, I am Chris Stivers, one of the developers of the Node.js client.

The reason for the out of control memory usage is the tight loop w/ 40m iterations. This loop is flooding the event loop w/ 40m objects, which causes the objects to be backlogged until processed. As you may know, Node.js sits atop of libuv, for the event loop, which provides a predefined number of threads to process the event-loop. Aerospike runs within these threads provided by libuv, so Aerospike is limited by libuv / Node.js. The objects will be cleared (GC’d) up once consumed by the Aerospike client.

The saturation of the event loop is a common issue with any event-based system, whether it be Node.js, libevent, etc. To see better behavior, you will want to throttle. This is a good idea for most cases where you want to do huge batches such as this.

If you take a look at the benchmark, you will see we do the throttling by defining how many operations to perform in each batch, which we called “iterations”. This prevents the event loop from being saturated with objects, which then eats up memory.

Also, I have created a simple script to illustrate this:

To run:

node index.js <ITERS> <OPS>

You can then experiment to see the memory utilization. It dumps the heap usage, as seen by Node.js, but you will also want to use top or similar program to see the consumption from the OS point of view.

For example, you can experiment by seeing (1 iteration x 40,000,000 ops) vs (4,000 iterations x 10,000 ops)

To run 1 iteration with 40m ops:

node index.js 1 40000000

To run 4000 iterations with 10000 ops

node index.js 4000 10000

You will see a drastic difference in memory consumption.


#5

Hi Chris,

exactly that was the problem. I was experimenting for the better part of the day with both benchmarkjs, async and matcha but not one of these approaches was suitable for my needs (Generating the highest possible load towards aerospike from within a single nodejs process).

My initial for(;:wink: approach was of course wrong for the reasons you stated.

Thanks for the script. Your approach is similar to what my investigation lead me to however with a small difference.

My plan was to pump out as many requests/operations as possible, so in essence I wanted to find the sweet spot between how much time I can spend calling aerospike.put() and letting the event loop go to the next tick.

It seems that in order to do what I want was the following snippet:

setImmediate(function loop() {
    var key = aerospike.key('test', 'test', something);
    client.put(key, object, callback);
    setImmediate(loop);
});

which essentially fires as many times as possible and is much closer to what I’d like to do.

Do you think this approach is better?


#6

I hardly use setImmediate(). From what I understand, it is its own event emitter, which will fire off the events in sequence they are queued, but will pause on IO. Not sure if you will saturate it and blow up memory, but your tests will tell you that for sure.