We will run a all-nodes’ scan job every day and get something out from the result of scan. Now we’re thinking of change one round of all-nodes’ scan to a map-reduce job. But I find that there’s something which may make this attempt failed. I haven’t found any simple way to split the scan job into multiple sub-tasks. For example, I have 8 nodes in one aerospike cluster, the only pattern of split I can imagine which may distribute the data set for mapper is to scan the 8 nodes using 8 mapper respectively. We have 4 sets in one node and of course I can scan each set of the nodes using more mappers, but this will cause the unrebalance of the data sets for each mapper.
So is there any way for me to scan one set in one node using multiple processes?