Automated Backups: Why Manual Tarballs Will Get You Fired
There are two types of system administrators: those who have lost data, and those who are about to. If your disaster recovery strategy relies on you remembering to run a script every Friday afternoon, you have already failed. In the hosting world, uptime is money, but data integrity is survival.
I recently audited a setup for a media agency in Oslo. They had a beautiful RAID 5 array. They thought they were safe. When the controller card failed and corrupted the parity data across all disks, they realized that RAID is not a backup. It took them three weeks and 40,000 NOK in forensic data recovery to get their client data back.
Here is how to professionalize your backup workflow using tools available today, ensuring you sleep through the night even when the hardware melts.
The 3-2-1 Rule: The Golden Standard
Before we touch a single line of code, understand the philosophy. The 3-2-1 rule is non-negotiable for professional environments:
- 3 copies of your data (Production + 2 backups).
- 2 different media types (e.g., Disk on server + Tape or Remote VDS).
- 1 copy kept off-site.
If your server is in a datacenter in Oslo, your off-site backup should be in Bergen, Stockholm, or Frankfurt. If a fire suppresses the halting systems in the primary DC, your local backups burn with the server.
The Toolchain: Rsync and Cron
Forget expensive enterprise bloatware. The most robust tools are already installed on your Linux distribution, whether you run CentOS 5 or Debian Lenny. We use rsync because it handles differential transfers efficiently, saving bandwidth and time.
1. The Script
Create a script at /usr/local/sbin/daily_backup.sh. This isn't just about copying files; it's about rotation.
#!/bin/bash
# Basic incremental backup script
SOURCE="/var/www/html"
DEST="/mnt/backup_drive"
DATE=$(date +%F)
# Create a new directory for today
mkdir -p $DEST/$DATE
# Rsync with hard links to the previous backup to save space
rsync -aP --link-dest=$DEST/current $SOURCE $DEST/$DATE
# Update the 'current' symlink
rm -f $DEST/current
ln -s $DEST/$DATE $DEST/current
Pro Tip: Always use SSH keys for authentication. Hardcoding passwords in scripts is a security vulnerability waiting to be exploited. Generate a key pair with ssh-keygen -t rsa and push it to your backup server.
2. The Database Dilemma
You cannot simply copy /var/lib/mysql while the database is running. You will end up with corrupted tables. You need a consistent dump.
For MySQL 5.0/5.1 (InnoDB tables), use the --single-transaction flag to avoid locking your tables during the backup. This prevents your website from freezing while the backup runs.
mysqldump -u root -p[password] --all-databases --single-transaction | gzip > /backup/mysql_dump_$(date +%F).sql.gz
Off-Site Replication and Latency
Once your data is secured locally, it must move. This is where network performance becomes critical. Transferring gigabytes of data over a congested public link can kill your web server's response time.
When choosing a provider, look for unmetered internal networks or secondary network interfaces. At CoolVDS, we provision instances with high-performance 15k SAS RAID 10 storage and dedicated bandwidth. This means your nightly rsync job to a secondary storage VPS doesn't choke your Apache processes serving customers.
Legal Compliance in Norway
Operating in the Nordics adds a layer of responsibility. Under the Personopplysningsloven (Personal Data Act) and the EU Data Protection Directive (95/46/EC), you are responsible for the security of personal data.
If you are backing up customer data to an external server, you must ensure:
- The data remains encrypted during transit (SSH/SCP covers this).
- The destination server is within the EEA (European Economic Area) to satisfy Datatilsynet requirements without complex Safe Harbor agreements.
This is why pragmatic CTOs choose hosting partners with clear data sovereignty. A budget US-based host might be cheaper, but is it worth the legal exposure?
The "CoolVDS" Reference Architecture
We built CoolVDS to solve these exact headaches. We don't use OpenVZ containers that oversell RAM and rely on a shared kernel. We use Xen virtualization which provides true isolation. If a neighbor's backup process spikes CPU usage, your dedicated resources remain untouched.
Furthermore, standardizing on high-speed SAS storage means the I/O bottleneck—the most common cause of backup failure—is virtually eliminated. You can run heavy compression operations (gzip -9) without bringing your site to a crawl.
Conclusion
Automation is the difference between a minor hiccup and a business-ending event. Do not trust manual processes. Do not trust single hard drives.
Set up your cron jobs today. Test your restore procedure tomorrow. And if your current host creates I/O wait times that make backups impossible, it’s time to move.
Need a sandbox to test your rsync scripts? Deploy a high-performance Xen instance on CoolVDS in minutes.