Scan jobs stop at 79% (AER-2986) [Released] [Resolved]


#1

We have a problem with Scan operations with Aerospike Server EE 3.5.9 and Java Client 3.1.1.

  1. When running more than one scan with the same priority (low or high), only one runs. All subsequent ones apparently are queued and do not start until the first one finishes.

  2. When starting a high priority scan while a low priority scan is running, the high priority scan will run, but will always halt at 79% progress, until the low priority scan has finished.

It does not matter what kind of Namespaces (in memory, ssd; same, different ones) or Sets the scans are running on. It also does not matter if the scans are issued from the same client instance of different ones. Tests performed on single or multiple nodes active. Nothing related is being logged in the server log.

The scan policies:

ScanPolicy sP = new ScanPolicy();
    sP.includeBinData = true;
    sP.scanPercent = percent;
    
    if (isHighPriority) {
    	sP.concurrentNodes = true;
    	sP.priority = Priority.HIGH;
    } else {
    	sP.consistencyLevel = ConsistencyLevel.CONSISTENCY_ONE;
    	sP.concurrentNodes = false;
    	sP.maxConcurrentNodes = 1;
    	sP.priority = Priority.LOW;
    	sP.failOnClusterChange = true;
    }

asyncClient.scanAll(sP, new RecordSequenceListener<>(), namespace, set, "");

The AsyncClient policy:

cPolicy = new AsyncClientPolicy();
cPolicy.timeout = 500;
cPolicy.asyncMaxCommandAction = MaxCommandAction.BLOCK;
cPolicy.writePolicyDefault = wPolicy;
cPolicy.asyncTaskThreadPool = ThreadPools.aeroTaskExecutor;
cPolicy.asyncSelectorThreads = 2;

Also, while a Backup is running: Low priority scans do no progress, high priority halt at 79%. If another scan is running when a backup is initiated, the backup scan does not progress (regardless the priority of the other scan).

What can be the cause?


Inconsistency in aero: throw a warning when client isn't able to reach all the nodes in a cluster
#2

Hi Blonkel, This is a known scan issue. Once a scan comes we parallelise the work into multiple threads by scheduling all work to be done for it. All the new scans will scheduled behind the first scan. Internally we have 5 threads for scheduling these scan jobs. Low priority scan job uses one thread among the five threads and higher priority scan jobs uses all 5 threads. So if higher priority scan job scheduled after a lower priority scan job then some of its work will be scheduled in first thread which has already lower priority scan job. So for higher priority scan, 80% work will be finished and 20% of scan job will wait for lower priority scan job to finish.

Backup also uses scan internally. So it has same problem.

We are working on this to make all scan jobs run in parallel.


#3

Hello

Thank you for your fast answer. We badly need parallel working scan jobs, not only for the case of backups. Is there any estimated time when this issue is solved?


#4

Hi blonkel,

A JIRA has been filed to follow up on this; it’s AER-2986, just for reference.

Unfortunately, we do not provide time estimates for fixing issues in the Community Edition (the free version of our product).

Please stay tuned for updates on our progress.

Regards,

Maud


#5

@blonkel,

For the quickest notification once this issue is fixed, please check our server release notes page periodically. We will also update this forum topic once a new server release is out that fixes the issue.


#6

Sadly AER-2986 was not addressed with the 3.5.12 release. Any update on this issue?

We are using the Enterprise version.


#7

@blonkel,

This is a project we are actively working on. We will update you as soon as we have more information.


#8

@blonkel Scan rework has been done. You will not find a fix for AER-2986. But the entire scan has been reworked to have better performance and feature enhancement. You can use 3.6.0 release for the new scan.

Thanks


#9

The release notes for 3.6.0 indicate that scan improvement is one of the highlights of that release, and that AER-2986 has been fixed.