How can I run a scan job in map-reduce mode?

kof02guy · July 18, 2017, 7:10am

We will run a all-nodes’ scan job every day and get something out from the result of scan. Now we’re thinking of change one round of all-nodes’ scan to a map-reduce job. But I find that there’s something which may make this attempt failed. I haven’t found any simple way to split the scan job into multiple sub-tasks. For example, I have 8 nodes in one aerospike cluster, the only pattern of split I can imagine which may distribute the data set for mapper is to scan the 8 nodes using 8 mapper respectively. We have 4 sets in one node and of course I can scan each set of the nodes using more mappers, but this will cause the unrebalance of the data sets for each mapper.

So is there any way for me to scan one set in one node using multiple processes?

Albot · July 18, 2017, 11:24pm

I don’t quite understand what you want to do, but if you want to split a scan job up into chunks - that’s possible. I think you can use the digest modulo feature. http://www.aerospike.com/docs/guide/predicate.html . using Digest Modulo you should be able to spawn a program and have it scan, say 10% of the data set with each thread.

kof02guy · July 19, 2017, 2:34am

@Albot thank you very much. What you provide is exactly what I’m looking for. But I’m very sorry that we’re using aerospike 3.9.0.3. So is there any work round to get this feature in 3.9.0.3? By the way, I think I can divide a scan task into multiple processes not just threads using digest modulo, is that right?

Albot · July 20, 2017, 3:20am

Just upgrade! don’t know another way

Topic		Replies	Views
Partition-level scan: is it possible to split up a scan into finer-grained things than a node? (AER-5474) Delivered Requests	3	2321	June 16, 2020
Aerospike-hadoop connector closes prematurely during scan Hadoop	6	2768	January 16, 2015
Parallel full scan on nodejs client Tuning	4	1708	February 16, 2016
Problem with multiple parallel scans Feature Discussion scan	17	4176	June 14, 2016
Simultaneous scans of different sets from one namespace (3.6.0) Delivered Requests	6	3343	February 20, 2020

How can I run a scan job in map-reduce mode?

Related topics