Why do I see error on "stuck thread" when running asbackup?

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

Problem

I see the following error when issuing a backup job:

2019-06-30 15:13:02 GMT [ERR] [ 9129] Stuck thread detected
2019-06-30 15:13:02 GMT [ERR] [ 9129] Error while joining backup thread (error 110: Connection timed out)

Solution

A backup is in fact a scan job. To understand the functionality of scans in general, refer to the FAQ on Scans article.

The error is expected to be seen in 2 situations:

  1. The executed backup job was exited manually (ctrl+c) which would result in timing out of the connection. The active scan job on the node(s) would eventually time out. If necessary, an explicit kill of the job might be needed.

Refer to the asinfo command on killing scan jobs on the Manage Scans documentation page.

  1. If the error is seen on fresh execution of asbackup, it could mean that all the current scan-threads are busy processing ongoing and/or higher priority scan jobs. In this situation, a potential workaround would be to issue the asbackup at a higher priority (using ‘-f’ or ‘–priority’ option in asbackup) or to increase the number of scan-threads. Refer to Manage Scans page to understand scan priority.

References

Keywords

asbackup stuck thread

Timestamp

July 2019