FAQ - Common asbackup and asrestore questions
The asbackup and asrestore tools were comprehensively overhauled for Aerospike 3.6.1. This article discusses frequently asked questions about the new version.
How many threads are used by the asrestore process?
When restoring a backup using asrestore the number of threads can be specified. This figure is a ceiling figure. There will be no more threads than there are backup files to be restored. Therefore, the number of files produced by the backup may have an impact on the performance of the restore process, even with the same number of threads specified (–threads option).
What is the size of an asbackup file?
The size of an asbackup file depends on the amount of data to be backed up. asbackup splits files by default at 250MiB (configurable through the --file-limit option). Once the backup file reaches that size, another file will be created. Previously this would have been dictated by the size of records as the old asbackup would create a new file at 200,001 records. There should be no size limit for asbackup other than bound by the OS or ulimit.
How does asrestore use RAM?
During the restore asrestore reads backup files into memory in chunks of 16MiB. The older version of asrestore read the complete backup file into RAM before restoring from RAM.
Does the disk type used to store the asbackup file have an impact on performance?
Yes. The asrestore process is capable of being highly parallelised with multiple threads working at any one time. For an SSD drive this is a natural way to work and the process is very efficient. Rotational drives, which are dependant on the head of the drive being at a particular physical location are far less able to deal with highly parellelised applications and therefore perform worse with asrestore.
Does asbackup impact ongoing transactions?
The asbackup is basically a scan job so a properly sized cluster should have minimal impact.
Increasing ‘scan-thread’ may affect regular transactions. (See the following point).
tree sprigs may help if the number of records per partition is large (1 million records per partition and above typically).
How to tune asbackup
The option of priority exists which dictates the level of scan concurrency on each individual node. Higher values mean faster backups (relative to other scan jobs running on the system). For higher priorities, do monitor the read/write performance of your application to ensure the impact on the cluster performance is acceptable. Allowed values: 0 (auto), 1 (low), 2 (medium), 3 (high). The default value is 0. Any value higher than 3 is taken as a scan with medium priority.
asinfo -v 'jobs:module=scan;cmd=set-priority;trid=<jobid>;value=3
This is the size of scan thread pool. It can be dynamically increased or decreased.
Warning: typical recommended value is to match the number of cores on the host. Increasing this may impact regular transactions performance.
How to tune asrestore
This controls the number of threads spawned to write to the cluster. The upper limit will be the number of backup files themselves. The speed of the asrestore processing the files would depend on the disk I/O, CPU load, network bandwidth between asrestore client and the server and even the inter-node network.
What happens if asrestore fails to insert a record?
If the initial insertion of a record fails asrestore will retry inserting the record 10 times. Between tries there is a 1 second pause and the error is written to the logs at the debug severity level. If a record is not written after 10 tries, the restore is aborted with the error message “Too many errors, giving up”. The specific errors in which a retry is not effected are “record exists” (if option --unique is used), “generation mismatch” (unless if --no-generation is used) and “invalid username or password”.
What permission does a user need to be able to backup a namespace in a security enabled cluster?
To use asbackup on a namespace, a user only needs the ‘read’ permission. https://www.aerospike.com/docs/guide/security/access-control.html#permissions
Can you do daily or hourly backup?
Incremental backups are possible as of version 3.12. Refer to the
incremental backup article for further details.
Is there a configuration parameter to compress the backup data?
There is no direct parameter to compress the data but one can pipe to any compression tool:
asbackup --output-file - [...] | gzip -1 [...] >backup.asb.gz
Why is my backup file size different from my disk usage?
The disk usage cannot be directly compared with a backup file size (nor the sum of all backup files).
The first, obvious difference would be the replication factor, as data stored will have one or more copy based on the replication factor, and the backup will only have master records.
The other main difference, impacting small records, is the overhead involved in storing data in an Aerospike database. Refer to the capacity planning guide for details, but in short, Aerospike stores data in blocks of 128 bytes and has some overhead on a per record and per bin basis. For example, while a record size may be 141 bytes the overhead on each record is 64 bytes, and each bin will add 28 bytes; etc. Taking the sum of that overhead into account can sometimes get into higher multiples of 128 bytes. Finally, the records value are base 64 encoded when backed up, which will also impact the size.
Does anything else other than records get backed-up with asbackup tool by default?
UDFs and secondary indices definitions are backed up along with the records by default. There are selection options to not include UDF or records (only backup meta-data) as well if needed. https://www.aerospike.com/docs/tools/backup/asbackup.html#data-selection-options
However, tombstones, truncate metadata, and any defined user and roles metadata do not get backed up. These would need to be re-created on the cluster the backup is going to be restored to.
- Detailed information on asbackup
- Detailed information on asrestore
3.6.1 BACKUP RESTORE ASBACKUP ASRESTORE TOOLS