FAQ - Can Aerospike backups be stored in Amazon S3?

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

FAQ - Can Aerospike backups be stored in Amazon S3?

When running an Aerospike cluster on Amazon Web Services is it possible to take backups via Aerospike Tools and store these backups on Amazon S3?

Answer

Yes, it is possible to store Aerospike backups on Amazon S3 however experience has demonstrated that this is best confined to either smaller datasets or development instances. Production instances, where datasets and backups tend to be larger, take longer to move to S3 and as such are more vulnerable to network interruptions which cause the process to fail.

The process to take the backups is very simple.

  1. Take a normal backup via asbackup
  2. Use a utility or api to upload into S3.
  • Example utility and api: s3cmd, aws cli

These steps can be combined into a single command:

asbackup [...] --output-file - | gzip -9 | s3cmd put - s3://BUCKET/FILE

Restoring the files is also a simple process.

  1. Download the backup file from S3 onto a system with asrestore.
  2. Use asrestore to restore the data back into a cluster.

cat BACKUP.GZ | gunzip | asrestore [...] --input-file -

Notes

It is recommended to set IAM Roles on instances that perform the backups so that S3 credentails are not saved. Aerospike credentials are still required if using authentication.

Keywords

BACKUP RESTORE AWS S3 S3CMD

Timestamp

January 21st 2019

Is it possible to pipe this into the AWS provided cli (awscli) instead of s3cmd? What is the behavior if s3 push moves slower than asbackup, will asbackup abort or will it throttle itself?

Yes, that should be possible (seems the article mentions this too). Yes, if it slows down enough, the default timeout may kick and fail the process. Would need to be adjusted too, but for larger datasets it may then increase the total time significantly, and, as the article points out, make it more prone to potential network interruptions too, depending where the backup is run from.