I manage a few Linux web servers. Some are for personal use, others for my employer; some I have physical access to, others I don’t. Despite wishes and even prayers, hardware can and will fail. Data centers can have service outages. It is important to keep an up-to-date, external copy of all your data in a secure place for that day when something really bad happens. Below I describe how I use Amazon S3 and s3cmd to keep my servers backed up.
Imagine the data infrastructure obstacles Amazon had to overcome to build the largest online shopping center. Most of us do not have the resources to build and maintain such a reliable and expandable system as they have developed for their own use. Thankfully, for a relatively small fee, Amazon provides use of their infrastructure under the name Amazon Web Services:
“[With Amazon Web Services] you can take advantage of Amazon.com’s global computing infrastructure, that is the backbone of Amazon.com’s multi-billion retail business and transactional enterprise whose scalable, reliable, and secure distributed computing infrastructure has been honed for over a decade.
Using Amazon Web Services, an e-commerce web site can weather unforeseen demand with ease; a pharmaceutical company can “rent” computing power to execute large-scale simulations; a media company can serve unlimited videos, music, and more; and an enterprise can deploy bandwidth-consuming services and training to its mobile workforce.” – taken from http://aws.amazon.com/what-is-aws/
The Amazon Web Services product we are taking advantage of as the location of our backup storage is S3 (Simple Storage Service). Create an account and then use the Management Console to create a bucket (container for your storage). You can create as many buckets as you like but the bucket names must be globally unique. I use a specific prefix for all my buckets (such as my initials) and create a bucket for each piece of hardware I want to backup.
The important information you will need to configure your backup tool is your Access Key ID and Secret Access Key. They are found under Account > Security Credentials. Do not allow anyone else to have access to these two keys because with them anyone can access your data stored in S3 and begin racking up a large bill if they like.
s3cmd is a free Linux command line tool for uploading and downloading data to and from your Amazon S3 account.
Download and install s3tools manually or do what I did and add their package repository to your package manager for a much easier install.
After installing s3cmd configure it by running the following command:
# s3cmd --configure
Enter your Access Key ID and Secret Access Key discussed earlier and use the default settings for the rest of the options unless you know otherwise.
If you haven’t already created a bucket you can do that now with s3cmd:
# s3cmd mb s3://unique-bucket-name
List your current buckets to make sure you successfully created one:
# s3cmd ls
2010-10-30 02:15 s3://your-bucket-name
You can now upload, list, and download content:
# s3cmd put somefile.txt s3://your-bucket-name/somefile.txt
somefile.txt -> s3://your-bucket-name/somefile.txt [1 of 1]
17835 of 17835 100% in 0s 35.79 kB/s done
# s3cmd ls s3://your-bucket-name
2010-10-30 02:20 17835 s3://your-bucket-name/somefile.txt
# s3cmd get s3://your-bucket-name/somefile.txt somefile-2.txt
s3://your-bucket-name/somefile.txt -> somefile-2.txt [1 of 1]
17835 of 17835 100% in 0s 39.77 kB/s done
A much better and more advanced method of backing up your data is to use ‘sync’ instead of ‘put’ or ‘get’. Read more about how I use sync in the next section.
Automate backup with a shell script and cron job
Below is a sample of the shell script I wrote to backup one of my servers:
# Syncronize /root with S3
s3cmd sync --recursive /root/ s3://my-bucket-name/root/
# Syncronize /home with S3
s3cmd sync --recursive /home/ s3://my-bucket-name/home/
# Syncronize crontabs with S3
s3cmd sync /var/spool/cron/ s3://my-bucket-name/cron/
# Syncronize /var/www/vhosts with S3
s3cmd sync --exclude 'mydomain.com/some-directory/*.jpg' --recursive /var/www/vhosts/ s3://my-bucket-name/vhosts/
# Syncronize MySQL databases with S3
mysqldump -u root --password=mysqlpassword --all-databases --result-file=/root/all-databases.sql
s3cmd put /root/all-databases.sql s3://my-bucket-name/mysql/
rm -f /root/all-databases.sql
I use ‘s3cmd sync –recursive /root/ s3://my-bucket-name/root/’ and ‘s3cmd sync –recursive /home/ s3://my-bucket-name/home/’ to synchronize all data in the local /root and /home directories including their subdirectories with S3. I use ‘sync’ instead of ‘put’ because I do not always know exactly what files are stored in these folders. I want everything backed up, including any new files created in the future.
With ‘s3cmd sync /var/spool/cron/ s3://my-bucket-name/cron/’ I omit ‘–recursive’ because I do not care about any subdirectories (there aren’t any).
With “s3cmd sync –exclude ‘mydomain.com/some-directory/*.jpg’ –recursive /var/www/vhosts/ s3://my-bucket-name/vhosts/” I synchronize /var/www/vhosts but exclude all jpg files inside a particular directory because they are replaced very frequently by new versions and are unimportant to me once they are a few minutes old.
Using mysqldump I export all databases to a text file that can be easily used to recreate them if needed. I upload the newly created file using ‘s3cmd put /root/hold-for-S3/all-databases s3://my-bucket-name/mysql/’.
To read more about sync and its options such as ‘–dry-run’, ‘–skip-existing’, and ‘–delete-removed’ read http://s3tools.org/s3cmd-sync.
Create a cron job to execute your shell script as often as you like. Now you can be less worried about losing all your important data.