Parallel full scan on nodejs client


#1

Hello,

I use nodejs client for Aerospike, and I would like to create for example 4 child processes and have each one query 1/4 of the database. I have proved with my own custom script that doing this increases up to 2x the speed of the full scan.

Is there I can achieve this with scan method?

For example, I would like to execute the following scan:

var scan = asd_client.query(“namespace”, “set”, options); var stream = scan.execute();

But once per child, and they scan different documents, complementing each other and getting 100% of the set. Is this possible?

Thanks!


#2

To be able to do this, you would need some key on all of your records that you could use to partition the whole set into 4 roughly equal parts using range query filters.


#3

Hello Jan, thanks for your reply.

I was thinking but I do not see how to create an even-partition key so the quantity of records that every parallel process reads are the same. Do you have any idea?

Of course I could just make a random number from 1 to 4 and put every record with that randomness, but that does not seem like the best solution.

Regards,


#4

Hi @fdnieves, if increasing scan performance is your main concern, have you tried increasing the libuv worker thread pool size by setting the UV_THREADPOOL_SIZE environment variable? [1] The default size is 4.

If you truly need to partition your record set, i.e. to split processing the scan results across multiple machines, then adding a separate, randomly assigned integer key that you can use for partitioning, might be the best solution.

[1] http://docs.libuv.org/en/stable/threadpool.html


#5

Thanks a lot for your feedback Jan, I will try that environment variable to see if it changes the performance, otherwise will use that random integer.

Regards,