Why do I get a warning - "job with trid X already active" when issuing a scan?

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

Why do I get a warning - “job with trid X already active” when issuing a scan?

Problem Description

On server version prior to 4.9, I have a scan job failing, and the aerospike log records messages like

Feb 10 2020 09:59:23 GMT: INFO (scan): (scan.c:617) starting basic scan job 283603086331273267 {namespace:set1} priority 2 sample-pct 100 socket-timeout 30000 from 10.10.10.10:33800
Feb 10 2020 09:59:23 GMT: WARNING (job): (job_manager.c:472) job with trid 283603086331273267 already active
Feb 10 2020 09:59:23 GMT: WARNING (scan): (scan.c:621) basic scan job 283603086331273267 failed to start (4)

Explanation

This message could be because of the following 2 reasons:

  1. A scan which timed out or failed was retried on the node. However, the scan had not completely terminated on other nodes, so the new try conflicted with the older try of the same scan (having the same “trid”, transaction ID). The error code 4, at the end of the “failed to start” line, is a general parameter error, the parameter in this case being the trid.
  2. A scan (query without a predicate filter) was issued with a taskID that already exists in a running scan.

Solution

  1. We do not recommend having retries on scans and queries (If a retry occurs, you will receive duplicate records if initial failure occurs during receive), and thus maxRetries is recommended (and is the default) to be 0 when issuing scans against server versions prior to 4.9.
  2. Do not issue a scan or query with the same taskID as one that is currently running. You can use the jobs info command through the [Info class] to find the currently running tasks and their IDs.

Notes

  • A common reason why a scan would be issued with the same taskID as a previous scan is when a statement member variable is referenced and reused within client code. When a statement instance is being reused the taskID should be reset to zero which forces the next query to recalculate the taskID. Logic of this type is commonly used with JDBC but must be modified for use with Aerospike.

  • The following line can be added to retrieveRuleParameter to force recalculation of taskID on each query:

    statement.setTaskId(0);

  • If retrieveRuleParameter is being called in parallel from multiple threads then the statement should not be shared. In this instance a new statement should be created on a per-query basis.

Keywords

SCAN TRID JOB RETRY

Timestamp

February 2020