Problem with multiple parallel scans

Hello. I’m using aerospike for a project that I’ve been working on, but I have a problem with scans. When I launch two scan jobs, the second one does not start processing until the first one ends.

If anyone can help me, I would really appreciate it. I’ve been struggling against this problem for quite some time and I can’t find a solution.

I use aerospike server 3.8.1 build, and I have a cluster of 3 servers. I use nodejs client for launching the jobs and they run on different namespaces.

The output for the show scans aql command is as follows:

| ns        | mem-usage | trid                 | job_id               | set          | priority | job-type | module | status                    | scanned_records | job_status    | recs-read | run-time | net-io-bytes | time-since-done | job-progress | job_progress(%) |

| "namespace2" | 0         | 6285645334030310410  | 6285645334030310410  | "set2" | 2        | "basic"  | "scan" | "active(ok)"              | 0               | "IN PROGRESS" | 0         | 210729   | 0            | 0               | 0            | 0               |

| "namespace1"  | 0         | 9141189537085453602  | 9141189537085453602  | "set1"     | 2        | "basic"  | "scan" | "active(ok)"              | 18945           | "IN PROGRESS" | 18945     | 253652   | 22038366     | 0               | 0            | 0 

Both of them appears as active(ok) but as it can be seen, the first scan (launched last) has recs-read = 0. Then if I kill the second scan (launched first), then the first one is able to process and runs correctly:

| "namespace2" | 0         | 6285645334030310410  | 6285645334030310410  | "set2" | 2        | "basic"  | "scan" | "done(ok)"                | 31612           | "DONE"     | 31612     | 391055   | 6804607      | 6390            | 100          | 100             |

| "namespace1"  | 0         | 9141189537085453602  | 9141189537085453602  | "set1"     | 2        | "basic"  | "scan" | "done(abandoned-unknown)" | 31041           | "DONE"     | 31041     | 433308   | 38830042     | 7060            | 0            | 0               |

I’ve left the default config for scan-max-active = 100, but perhaps there is another configuration for this that I don’t know of. This is the output for the $ asinfo -v "get-config:context=service" command:

transaction-queues=8;transaction-threads-per-queue=8;transaction-duplicate-threads=0;transaction-pending-limit=20;migrate-threads=1;migrate-max-num-incoming=256;migrate-rx-lifetime-ms=60000;proto-fd-max=15000;proto-fd-idle-ms=60000;proto-slow-netio-sleep-ms=1;transaction-retry-ms=1000;transaction-max-ms=1000;transaction-repeatable-read=false;ticker-interval=10;log-local-time=false;microbenchmarks=false;storage-benchmarks=false;ldt-benchmarks=false;scan-max-active=100;scan-max-done=10;scan-max-udf-transactions=32;scan-threads=4;batch-index-threads=4;batch-threads=4;batch-max-requests=5000;batch-max-buffers-per-queue=255;batch-max-unused-buffers=256;batch-priority=200;nsup-delete-sleep=100;nsup-period=120;nsup-startup-evict=true;paxos-retransmit-period=5;paxos-single-replica-limit=2;paxos-max-cluster-size=32;paxos-protocol=v3;paxos-recovery-policy=auto-reset-master;write-duplicate-resolution-disable=false;respond-client-on-master-completion=false;replication-fire-and-forget=false;info-threads=16;allow-inline-transactions=true;use-queue-per-device=false;snub-nodes=false;prole-extra-ttl=0;max-msgs-per-type=-1;service-threads=8;fabric-workers=16;pidfile=/var/run/aerospike/asd.pid;memory-accounting=false;udf-runtime-gmax-memory=18446744073709551615;udf-runtime-max-memory=18446744073709551615;sindex-builder-threads=4;sindex-data-max-memory=ULONG_MAX;query-threads=6;query-worker-threads=15;query-priority=10;query-in-transaction-thread=0;query-req-in-query-thread=0;query-req-max-inflight=100;query-bufpool-size=256;query-batch-size=100;query-priority-sleep-us=1;query-short-q-max-size=500;query-long-q-max-size=500;query-rec-count-bound=18446744073709551615;query-threshold=10;query-untracked-time-ms=1000;query-pre-reserve-partitions=false

Thanks!

How many cores do those server nodes have? Also, can you post a code snippet of what you’re doing? I want to see the scan policies you’re using. This may be a bug, and we’re trying to track it down.

First, if you’re explicitly setting a priority, please remove that and try again. There’s a recent fix in release 3.8.2.2 labeled AER-4970 that may be your issue. Upgrade to the latest version (currently 3.8.2.3), please.

Hello rbotzer.

Thank you for your quickly response.

I have 4 cores with Hyper-Threading (8 virtual cores).

This is the first scan I launch:

var statement = { 'filters': [] }

var scan = asd.query("namespace1", "set1",statement);

scan.priority = aerospike.scanPriority.LOW
scan.percent = 1
scan.concurrent = true
var stream = scan.foreach();

stream.on('data', function(record) {
    [...]
});

stream.on('error', function(error) { [...] });

stream.on('end', function(end){
    asd.close()
    [...]
});

And this is the second one. This one, removes each record once it has been processed:

var scan = asd.query("namespace2", "set2", {
    concurrent: true,
    nobins: false,
    priority: aerospike.scanPriority.LOW
});

var stream = scan.execute();

stream.on('data', function(record)
{
    [...]
    asd.remove(aerospike.key('namespace2', 'set2', ids[0]), function(){});
});

stream.on('error', function(error)
{
    asd.close();
    [...]
});

stream.on('end', function(end)
{
    asd.close();
    [...]
});

Yes, we were declaring a priority for the scans. I’ve removed it and tried again, but the same thing happened.

Great, I will try the latest version and I will let you know how it went.

Thanks.

Hi Diego

A few questions that will help with our investigation…

  1. If you run the first scan (the one you launched first) alone, without launching another scan, does it complete?

  2. If you launch both scans, as in the case for which you provided the “show scans” output, and one scan is stuck with recs-read = 0, does the other scan make any progress over time? If you run several “show scans” commands every minute or so, do the recs-read or net-io-bytes numbers change at all?

  3. How did you kill the scan? Do you use the kill_scan command in aql or some other method?

Thanks,

Brad

Hi Brad. Thank you for your interest!

  1. Yes it does.

  2. Yes it does progress. The first scan seems to work perfectly with or without the second scan. The recs-read count increases over time.

In fact the first scan, reads a set and for each record, it writes data to another set, kind of like a queue. Then the second scan reads from that second set, consuming the queue and freeing its records.

The problem is that when they run at the same time, the queue is not consumed and it only increases in size.

  1. I stop the nodejs process that runs the scan, and then it gets terminated.

Thanks once again for taking the time!

Interesting… thanks for the info. So if both scans are running, the first one (that writes to another set) will eventually complete? And if so, does the second scan (that reads/deletes) remain stuck with recs-read equal to 0 even after the first scan has completed?

@dstallo,

You are using the client’s Query API instead of the Scan API. The priority, percent and concurrent properties are only supported by the Scan API:

var statement = { 'filters': [] }

var scan = asd.query("namespace1", "set1",statement);

scan.priority = aerospike.scanPriority.LOW
scan.percent = 1
scan.concurrent = true
var stream = scan.foreach();

Try changing var scan = asd.query(...) to var scan = asd.scan(...). You don’t need to pass any query statement in this case either.

Note that since you are not setting any filter predicate, the server will still execute your query as a scan. But the scan specific settings (priority, percent, concurrent) will not get passed by the client and so will not be applied.

Also, are you running the second scan using the v1 client? The code sample you posted shouldn’t work with the v2 client at all; Client#query should raise an error if you try to set the scan specific properties concurrent, nobins and priority:

var scan = asd.query("namespace2", "set2", {
    concurrent: true,
    nobins: false,
    priority: aerospike.scanPriority.LOW
});
AssertionError: Invalid query arguments: concurrent,nobins,priority. Use Client#scan instead.

The scan implementation in the v1 client is very inefficient and blocks the event queue if processing the scan results takes longer than it takes to read them from the server. See this recent blog post for details. I strongly advise to use the v2 client especially for scan and query operations.

Cheers, Jan

Hello Jan! Great! Yes, I updated the nodejs client to v2 in both processes and replaced query for scan, and the problem was partially solved. It outputed a “launching async scan” log, which it didn’t when using query, and the second scan started to process, but very slowly. Then I updated aerospike build to the last stable one, and then it worked like a charm. So I was having three problems: using .query instead of .scan, an outdated version of nodejs client and an outdated aerospike build version.

I will let you know when I deploy this to production to confirm that this is completly solved.

Thank you all very much!

Hi Everyone,

We still weren’t able to fix this issue. On my end I’m seeing the same problem, some details about my setup:

Computer cores: 4

Node version: 6.0.0

Aerospike Nodejs client version: 2.0.4

Aerospike version: Aerospike Community Edition build 3.8.2.3

Details:

2 scans running in different scripts (processes).

  • 1st scan: scan on a user namespace set, makes calculations. On some eligible users, two operations are being done. Updates the user (same namespace set than above), and writes to a different namespace set configured with ram only (which we use as a queue).

  • 2nd scan: scans on the above queue, and delete each record after read.

Problem: 1st scan executes normally while 2nd scan stucks at 0 records. When the 1st scan finishes, the 2nd scan resumes its execution and finishes without problem.

I tried different setups, removing all priorities, with high/low, low/low, etc… Also with and without percentage. I’m using client.scan as the method (instead of .query which was recommended not to use). The scan code is something like this:

var scan = asd.scan("ns1", "set1");

scan.concurrent = true;
// scan.priority = aerospike.scanPriority.LOW
var stream = scan.foreach();

stream.on('data', function(item)
{...

Maybe I’m missing something?

Thanks!

Hi!

Is there anyone that maybe can help us with this problem?

Thanks!

What is the elapsed execution duration of the 2 scans you mentioned here?

For the full time during that covers both of these jobs, can you monitor the jobs progress against one node

asinfo -v"jobs:module=scan" -l

Maybe a total of 20 such calls equally spaced for the full time duration that covers both jobs.

Thanks.

Hi wchu, thanks for your response.

This is the command for just 1 of the scans, the one that flushes the queue:

module=scan:trid=3581612585714989428:job_id=3581612585714989428:job-type=basic:ns=nsname:set=setname:priority=2:status=done(ok):job_status=DONE:job-progress=100:job_progress(%)=100:run-time=423:time-since-done=579286:recs-read=28993:scanned_records=28993:net-io-bytes=7516255:mem-usage=0

module=scan:trid=4063780922664705162:job_id=4063780922664705162:job-type=basic:ns=nsname:set=setname:priority=2:status=done(ok):job_status=DONE:job-progress=100:job_progress(%)=100:run-time=456:time-since-done=1179349:recs-read=29275:scanned_records=29275:net-io-bytes=7588695:mem-usage=0

module=scan:trid=3555146715120073150:job_id=3555146715120073150:job-type=basic:ns=nsname:set=setname:priority=2:status=done(ok):job_status=DONE:job-progress=100:job_progress(%)=100:run-time=494:time-since-done=1779262:recs-read=29099:scanned_records=29099:net-io-bytes=7554405:mem-usage=0

module=scan:trid=195178967694124580:job_id=195178967694124580:job-type=basic:ns=nsname:set=setname:priority=2:status=done(ok):job_status=DONE:job-progress=100:job_progress(%)=100:run-time=386:time-since-done=2379657:recs-read=28397:scanned_records=28397:net-io-bytes=7384543:mem-usage=0

module=scan:trid=3301805607050729026:job_id=3301805607050729026:job-type=basic:ns=nsname:set=setname:priority=2:status=done(ok):job_status=DONE:job-progress=100:job_progress(%)=100:run-time=358:time-since-done=2979683:recs-read=28122:scanned_records=28122:net-io-bytes=7294583:mem-usage=0

module=scan:trid=14910122126511719:job_id=14910122126511719:job-type=basic:ns=nsname:set=setname:priority=2:status=done(ok):job_status=DONE:job-progress=100:job_progress(%)=100:run-time=492:time-since-done=3579502:recs-read=27988:scanned_records=27988:net-io-bytes=7273681:mem-usage=0

module=scan:trid=458587550394916935:job_id=458587550394916935:job-type=basic:ns=nsname:set=setname:priority=2:status=done(ok):job_status=DONE:job-progress=100:job_progress(%)=100:run-time=423:time-since-done=4178678:recs-read=21630:scanned_records=21630:net-io-bytes=5597634:mem-usage=0

module=scan:trid=8092374457550496541:job_id=8092374457550496541:job-type=basic:ns=nsname:set=setname:priority=2:status=done(ok):job_status=DONE:job-progress=100:job_progress(%)=100:run-time=223:time-since-done=4617222:recs-read=20889:scanned_records=20889:net-io-bytes=5401435:mem-usage=0

module=scan:trid=1278323725992388284:job_id=1278323725992388284:job-type=basic:ns=nsname:set=setname:priority=2:status=done(ok):job_status=DONE:job-progress=100:job_progress(%)=100:run-time=56:time-since-done=5030824:recs-read=9953:scanned_records=9953:net-io-bytes=2552215:mem-usage=0

module=scan:trid=8780958047304841560:job_id=8780958047304841560:job-type=basic:ns=nsname:set=setname:priority=2:status=done(ok):job_status=DONE:job-progress=100:job_progress(%)=100:run-time=184:time-since-done=5064030:recs-read=17283:scanned_records=17283:net-io-bytes=4449675:mem-usage=0

I’m just putting the values for the flush scan because is the one that is currently on production and works really well. The other scan is much larger, probably hours because it covers the whole database of users. Weird thing is that even this queue-flush scan (which always runs very fast) gets stuck.

Is there anything else that we can do to test?

Thanks!

From your original post, it sounds like you have a situation with two concurrent scans, where second scan does not start until first scan is finished. This is not expected.

In order to troubleshoot that, we should see the slices of both scan jobs progression, through the start/end time of both scans.

Thanks.

Hi wchu!

Yes you are right. Here is the log of both scans and what happened:

module=scan:trid=9291114243897586612:job_id=9291114243897586612:job-type=basic:ns=ramnamespace:set=set_for_queue:priority=2:status=done(ok):job_status=DONE:job-progress=100:job_progress(%)=100:run-time=220411:time-since-done=9613:recs-read=15555:scanned_records=15555:net-io-bytes=2819030:mem-usage=0

module=scan:trid=16697257330035842914:job_id=16697257330035842914:job-type=basic:ns=usernamespace:set=user_data:priority=2:status=done(abandoned-unknown):job_status=DONE:job-progress=0:job_progress(%)=0:run-time=223420:time-since-done=9650:recs-read=51957:scanned_records=51957:net-io-bytes=67154876:mem-usage=0

module=scan:trid=9183798047068417505:job_id=9183798047068417505:job-type=basic:ns=ramnamespace:set=set_for_queue:priority=2:status=done(ok):job_status=DONE:job-progress=100:job_progress(%)=100:run-time=20:time-since-done=238836:recs-read=0:scanned_records=0:net-io-bytes=30:mem-usage=0

module=scan:trid=10554638362571426932:job_id=10554638362571426932:job-type=basic:ns=ramnamespace:set=set_for_queue:priority=2:status=done(ok):job_status=DONE:job-progress=100:job_progress(%)=100:run-time=22:time-since-done=243420:recs-read=36:scanned_records=36:net-io-bytes=7140:mem-usage=0

module=scan:trid=13803587021835691931:job_id=13803587021835691931:job-type=basic:ns=ramnamespace:set=set_for_queue:priority=2:status=done(ok):job_status=DONE:job-progress=100:job_progress(%)=100:run-time=22699:time-since-done=417105:recs-read=2134:scanned_records=2134:net-io-bytes=389133:mem-usage=0

We first ran the scan on the set “set_for_queue” successfully three times, no delay there. After that we started the scan on the whole user data set, and inmidiatelly started the same “set_for_queue” scan. Because the user data set scan was running, set_for_queue scan wasn’t starting. We waited a few minutes and then interrumpted user data scan. Then set_for_queue scan resumed and finished without error.

Is there anything else I can provide you with?

Thanks!!

Hi @fdnieves the output you provided is only one asinfo call, after all the jobs are done.

What would e helpful are multiple asinfo calls, executed through the duration of the 2 concurrent scans (maybe every 5 minutes), to monitor both scans as both progresses.

According to your original post, I would expect that your first scan to successfully finish (go from job-progress=0 to job-progress=100), before second scan makes any progress.

However, based on your recent post -

It seems the first scan is large (hours) and was user terminated after just a few minutes.

By default, there are 4 scan threads concurrently working on any slice. Only when a slice is finished, will a different job’s slice be picked up.

Since you’ve only waited for a few minutes while doing a big scan, it is very likely that the first 4 slices have not finished yet, thus no other job’s slice would be picked up yet.

Looking that the job progress for the big scan for its whole duration will confirm this.

A different angle to look at your issue. It seems that you may be spending too much time on your large scan, because client processing is slow. You can change your application logic to cache on the client the users you want to update. Then after the scan is finished (it will now be quicker, because there is no in-line updates), you can go back and make the updates needed.

Hi wchu,

Thanks for your detailed answer. I managed to - on a test environment - reduce the user set and make the big scan finish. Here is the output of the scans ls command on different points in time for the two scans executed:

module=scan:trid=7781430301351153015:job_id=7781430301351153015:job-type=basic:ns=ramnamespace:set=set_for_queue:priority=2:status=active(ok):job_status=IN PROGRESS:job-progress=0:job_progress(%)=0:run-time=7245:time-since-done=0:recs-read=0:scanned_records=0:net-io-bytes=0:mem-usage=0
module=scan:trid=9449045071010040315:job_id=9449045071010040315:job-type=basic:ns=usernamespace:set=user_data:priority=2:status=active(ok):job_status=IN PROGRESS:job-progress=0:job_progress(%)=0:run-time=17736:time-since-done=0:recs-read=13798:scanned_records=13798:net-io-bytes=14694671:mem-usage=0

module=scan:trid=7781430301351153015:job_id=7781430301351153015:job-type=basic:ns=ramnamespace:set=set_for_queue:priority=2:status=active(ok):job_status=IN PROGRESS:job-progress=0:job_progress(%)=0:run-time=99682:time-since-done=0:recs-read=0:scanned_records=0:net-io-bytes=0:mem-usage=0
module=scan:trid=9449045071010040315:job_id=9449045071010040315:job-type=basic:ns=usernamespace:set=user_data:priority=2:status=active(ok):job_status=IN PROGRESS:job-progress=0:job_progress(%)=0:run-time=110173:time-since-done=0:recs-read=30547:scanned_records=30547:net-io-bytes=37777938:mem-usage=0

module=scan:trid=7781430301351153015:job_id=7781430301351153015:job-type=basic:ns=ramnamespace:set=set_for_queue:priority=2:status=active(ok):job_status=IN PROGRESS:job-progress=0:job_progress(%)=0:run-time=239589:time-since-done=0:recs-read=0:scanned_records=0:net-io-bytes=0:mem-usage=0
module=scan:trid=9449045071010040315:job_id=9449045071010040315:job-type=basic:ns=usernamespace:set=user_data:priority=2:status=active(ok):job_status=IN PROGRESS:job-progress=0:job_progress(%)=0:run-time=250080:time-since-done=0:recs-read=55791:scanned_records=55791:net-io-bytes=72990078:mem-usage=0

module=scan:trid=7781430301351153015:job_id=7781430301351153015:job-type=basic:ns=ramnamespace:set=set_for_queue:priority=2:status=active(ok):job_status=IN PROGRESS:job-progress=0:job_progress(%)=0:run-time=383251:time-since-done=0:recs-read=4:scanned_records=4:net-io-bytes=977:mem-usage=0
module=scan:trid=9449045071010040315:job_id=9449045071010040315:job-type=basic:ns=usernamespace:set=user_data:priority=2:status=active(ok):job_status=IN PROGRESS:job-progress=0:job_progress(%)=0:run-time=393742:time-since-done=0:recs-read=81635:scanned_records=81635:net-io-bytes=108449135:mem-usage=0

module=scan:trid=9449045071010040315:job_id=9449045071010040315:job-type=basic:ns=usernamespace:set=user_data:priority=2:status=done(ok):job_status=DONE:job-progress=100:job_progress(%)=100:run-time=460539:time-since-done=29047:recs-read=91897:scanned_records=91897:net-io-bytes=127087612:mem-usage=0
module=scan:trid=7781430301351153015:job_id=7781430301351153015:job-type=basic:ns=ramnamespace:set=set_for_queue:priority=2:status=done(ok):job_status=DONE:job-progress=100:job_progress(%)=100:run-time=450047:time-since-done=29048:recs-read=1093:scanned_records=1093:net-io-bytes=202655:mem-usage=0

That is a really good approach the one you were thinking and we already though about it already. The problem is that locally saving all those users exceeds our available RAM, and even if we can make it work now, our user database is growing all the time, so we need to get a solution that is not limited on RAM usage.

With the above output it is clear that as soon as the user_data scan finishes, the other resumes and makes its execution. Another bug that I see is that the job_progress is stuck on 0 until the scan finishes.

What we just need is to be able to run two parallel scans on different sets. What do you think we could try to make it work?

Thanks!

The job_progress being 0 may be an issue with the granularity of the progress. It can be that most of the scanned is in the beginning partitions, thus not showing up > 0%. We are enhancing the job_progress to floating number, so that it will capture any intermediate percentages.

In this case, the small job (7781430301351153015) does start before the larger job (9449045071010040315) is finished. It seems both jobs reach 100% at around the same time. The data distribution can be skewed (thus the number of records incremented) are not as even over time.

Update on issues reported in this thread -

  • job-progress resolution is changed to float (AER-5069) to give correct granularity of progress (% of partitions out of 4096 processed)

  • Data reported in this thread are skewed to be populated only in low number partitions, as they restored from partial backups.

  • Concurrent jobs are correctly processed at the same time. Increasing the priority of a scan (the small job in this case), will expedite it to be finished first.

  • Job priority is only supported via the “scan()” client API.