Parallel full scan on nodejs client

Hello,

I use nodejs client for Aerospike, and I would like to create for example 4 child processes and have each one query 1/4 of the database. I have proved with my own custom script that doing this increases up to 2x the speed of the full scan.

Is there I can achieve this with scan method?

For example, I would like to execute the following scan:

var scan = asd_client.query(“namespace”, “set”, options); var stream = scan.execute();

But once per child, and they scan different documents, complementing each other and getting 100% of the set. Is this possible?

Thanks!

To be able to do this, you would need some key on all of your records that you could use to partition the whole set into 4 roughly equal parts using range query filters.

Hello Jan, thanks for your reply.

I was thinking but I do not see how to create an even-partition key so the quantity of records that every parallel process reads are the same. Do you have any idea?

Of course I could just make a random number from 1 to 4 and put every record with that randomness, but that does not seem like the best solution.

Regards,

Hi @fdnieves, if increasing scan performance is your main concern, have you tried increasing the libuv worker thread pool size by setting the UV_THREADPOOL_SIZE environment variable? [1] The default size is 4.

If you truly need to partition your record set, i.e. to split processing the scan results across multiple machines, then adding a separate, randomly assigned integer key that you can use for partitioning, might be the best solution.

[1] Thread pool work scheduling — libuv documentation

Thanks a lot for your feedback Jan, I will try that environment variable to see if it changes the performance, otherwise will use that random integer.

Regards,