FAQ How to speed up asrestore
How to increase the speed of restoring data with asrestore.
One of the main option to the
asrestore process is the number of threads to spawn for writing to the cluster.
-t <threads> or --threads <threads> # 20 is the default
A higher number of threads usually increases the speed of the restore process but may impact the cluster’s overall performance (for client applications).
This option works differently based on the number of files being configured to be restored.
- When restoring multiple files
The restore tool spins up the configured number of threads capped by the number of files. So, the number of threads is whichever is smaller: configured number of threads or number of files.
The reason is that each file is handled by one thread. So we cannot use more threads than files. In this scenario, handling of a file is completely local to one thread. File accesses don’t have to be coordinated between threads.
- When restoring a single file
The restore tool spins up the configured number of threads. These threads all read from the same, single file. They use a mutex to make sure that only one thread at a time reads a record from the file.
Increasing the number of threads to a value too high could lead to mutex contention issues. Therefore, the performance is likely not going to be linear with the number of threads.
If you have multiple backup files in the backup directory, you could also run multiple
asrestore processes in parallel, each restoring a different file, using the following option:
-i <path> or --input-file <path> # The single file from which to read the backup. # - means stdin. Mandatory, unless --directory is given.
asrestore -h 10.0.100.199 --input-file BB9375FFD005452_00000.asb --threads 100 . . . 2018-06-29 21:44:38 GMT [INF]  0 UDF file(s), 0 secondary index(es), 248779 record(s) (22296 KiB/s, 248779 rec/s, 91 B/rec, backed off: 0) 2018-06-29 21:44:38 GMT [INF]  Expired 248779 : skipped 0 : inserted 0 : failed 0 (existed 0, fresher 0) 2018-06-29 21:44:38 GMT [INF]  100% complete, ~0s remaining
Backup a 3 node cluster into 3 directories like this:
asbackup --node-list node1_ip:3000 -d backup_node1 asbackup --node-list node2_ip:3000 -d backup_node2 asbackup --node-list node3_ip:3000 -d backup_node3
Restore them like this (assumming more than 32 files in each directory):
asrestore -h node1_ip -d backup_node1 --threads 32 ... asrestore -h node2_ip -d backup_node2 --threads 32 ... asrestore -h node3_ip -d backup_node3 --threads 32 ...
Similarly, we can restore in parallel from S3 (Refer to direct backup to AWS S3):
s3cmd get s3://BUCKET/OBJECT1 - | asrestore --input-file - ... s3cmd get s3://BUCKET/OBJECT2 - | asrestore --input-file - ... s3cmd get s3://BUCKET/OBJECT2 - | asrestore --input-file - ...
- Tools version 22.214.171.124 is actually by default slower to restore as the /etc/aerospike/astools.conf has the following entries:
#—————-Performance and throttling——————— nice-list = "1,10000" threads = 32
Therefore, the output is like this which is throttled (10000 rec/s):
2018-06-29 21:49:02 GMT [INF]  0 UDF file(s), 0 secondary index(es), 209980 record(s) (896 KiB/s, 10000 rec/s, 91 B/rec, backed off: 0) 2018-06-29 21:49:02 GMT [INF]  Expired 209980 : skipped 0 : inserted 0 : failed 0 (existed 0, fresher 0) 2018-06-29 21:49:02 GMT [INF]  84% complete, ~3s remaining
This issue was addressed in the subsequent release (tracked under TOOLS-1135). Upgrade to the latest tools or use the
--no-configure option to workaround it.
- Refer to the Backup FAQ for additional hints.
ASRESTORE SPEED SLOW PARALLEL RESTORE STUCK BACKUP ASBACKUP