Disaster Recovery in 2013: Why Your Nightly Tarball Is Not A Strategy
Let’s be honest. Most sysadmins in Oslo are sleeping soundly tonight because they have a cron job running tar -czf somewhere in /var/backups/. If that is you, wake up. You do not have a Disaster Recovery (DR) plan; you have a file archiving habit. And when the RAID controller on your physical server melts or the data center experiences a total power loss, that tarball sitting on the same local disk won't save your job.
I’ve seen it happen. A mid-sized e-commerce shop running Magento on a single dedicated box. The drive array degraded, the file system went read-only, and their "backup" was a corrupted dump from 24 hours ago. They lost a full day of orders and spent 48 hours rebuilding. In the current hosting market, downtime is not just an inconvenience; it is a business killer.
Real DR is about RTO (Recovery Time Objective) and RPO (Recovery Point Objective). How much data can you afford to lose? How fast can you be back online? If the answer is "zero" and "instantly," you need more than backup scripts. You need active replication and a failover strategy.
The Database: Asynchronous Replication is Mandatory
For most dynamic applications, the database is the single point of failure. In 2013, if you aren't running at least a Master-Slave setup, you are playing Russian Roulette. We rely heavily on MySQL 5.5 (or Percona Server if you want real performance metrics).
Here is the reality of configuring replication. It’s not just about turning it on; it’s about ensuring data integrity (ACID compliance) so your slave doesn't drift.
On your Master server (e.g., hosted on a high-performance CoolVDS SSD instance), your /etc/mysql/my.cnf needs to be bulletproof:
[mysqld]
server-id = 1
log-bin = /var/log/mysql/mysql-bin.log
binlog_format = mixed
# Safety first. This kills write speed slightly but saves your data.
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1
# Networking
bind-address = 0.0.0.0
On the Slave (your DR site, perhaps in a secondary availability zone):
[mysqld]
server-id = 2
relay-log = /var/log/mysql/mysql-relay-bin.log
read_only = 1
Pro Tip: Never rely on the default MyISAM storage engine for critical data. It lacks crash recovery. Always force default-storage-engine = InnoDB. InnoDB handles power outages; MyISAM handles them by corrupting your tables.
Filesystem Synchronization: Beyond SCP
Database replication is useless if your user-uploaded images or configuration files aren't on the standby server. While tools like GlusterFS are gaining traction, they add complexity and latency overhead that most setups don't need. The industry standard tool remains rsync.
Don't just run it manually. Automate it, but be smart about bandwidth. We use a wrapper script triggered by cron every 5 minutes, or for near real-time, we look at lsyncd (Live Syncing Daemon) which watches the kernel inotify events.
Here is a robust rsync command pattern that preserves permissions, owner times, and handles soft links correctly:
#!/bin/bash
# /usr/local/bin/sync-dr.sh
SOURCE_DIR="/var/www/html/"
REMOTE_HOST="dr-user@192.168.10.50"
REMOTE_DIR="/var/www/html/"
# -a: archive mode (recursive, preserves permissions/times/groups)
# -v: verbose
# -z: compress file data during the transfer
# --delete: delete extraneous files from dest dirs
rsync -avz --delete -e "ssh -p 22" $SOURCE_DIR $REMOTE_HOST:$REMOTE_DIR >> /var/log/dr-sync.log 2>&1
Traffic Failover: The Switch
Having the data there is half the battle. Pointing users to it is the other half. If your primary server goes dark, you cannot wait for DNS propagation (TTL). That can take hours.
The solution is a Floating IP (Virtual IP) managed by Keepalived using the VRRP protocol. This requires your provider to support layer 2 access or specific API routing, but on a standard setup, you can often use a reverse proxy load balancer like HAProxy or Nginx in front.
A basic keepalived.conf for failover looks like this:
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass s3cr3t
}
virtual_ipaddress {
10.0.0.100
}
}
When the Master dies, the Slave detects the missing VRRP heartbeats and claims the IP 10.0.0.100 instantly.
The Norwegian Context: Compliance & Latency
Operating in Norway brings specific challenges. Under the Personal Data Act (Personopplysningsloven), you are responsible for the security of personal data. Storing backups on an unencrypted FTP server in a non-EU jurisdiction is a compliance nightmare waiting to happen (remember the Safe Harbor framework is only a framework, not a magic shield).
Furthermore, latency matters. If your primary audience is in Oslo or Bergen, hosting your DR site in Texas is technically valid but practically painful due to the 140ms+ latency. You want your failover site to be geographically separate (to avoid the same power grid failure) but topologically close.
| Feature | Shared Hosting | Standard VPS | CoolVDS KVM |
|---|---|---|---|
| Isolation | None (Security Risk) | Software (OpenVZ) | Hardware (KVM) |
| Disk I/O | Slow SATA | Standard HDD | SSD RAID-10 |
| Kernel Control | No | Shared | Dedicated |
Why Architecture Matters
This setup works. I’ve deployed it for high-traffic portals that cannot tolerate more than 60 seconds of downtime. But software configuration can only go so far. If the underlying virtualization platform is stealing your CPU cycles or the "noisy neighbor" on your host is thrashing the disk I/O, your replication lag will skyrocket.
This is where CoolVDS becomes the reference implementation for this architecture. Unlike budget providers squeezing users onto OpenVZ containers, CoolVDS uses KVM virtualization. This means you get a dedicated kernel and reserved resources. When you run rsync or a MySQL dump, you get the full I/O throughput of the underlying SSD RAID-10 array, not just whatever is left over.
Disaster Recovery is not a product you buy; it is a process you build. But you can't build a fortress on a swamp. Start with solid infrastructure.
Ready to test your DR plan? Spin up a secondary KVM instance on CoolVDS today and see how fast replication can really be over our low-latency network.