Why do I see error on "stuck thread" when running asbackup?

Problem

I see the following error when issuing a backup job:

2019-06-30 15:13:02 GMT [ERR] [ 9129] Stuck thread detected
2019-06-30 15:13:02 GMT [ERR] [ 9129] Error while joining backup thread (error 110: Connection timed out)

Solution

A backup is in fact a scan job. To understand the functionality of scans in general, refer to the FAQ on Scans article.

The error is expected to be seen in 2 situations:

  1. The executed backup job was exited manually (ctrl+c) which would result in timing out of the connection. The active scan job on the node(s) would eventually time out. If necessary, an explicit kill of the job might be needed.

Refer to the asinfo command on killing scan jobs on the Manage Scans documentation page.

  1. If the error is seen on fresh execution of asbackup, it could mean that all the current scan-threads are busy processing ongoing and/or higher priority scan jobs. In this situation, a potential workaround would be to issue the asbackup at a higher priority (using ‘-f’ or ‘–priority’ option in asbackup) or to increase the number of scan-threads. Refer to Manage Scans page to understand scan priority.

References

Keywords

asbackup stuck thread

Timestamp

July 2019

© 2015 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.