Disaster Recovery Protocols: Why Your "Backups" Are Not Enough (And How to Fix It)
Let’s be brutally honest: if your disaster recovery (DR) plan consists solely of a nightly cron job dumping a tarball to the same physical disk as your production OS, you do not have a DR plan. You have a suicide note.
I recently audited a setup for a mid-sized e-commerce retailer here in Oslo. They claimed to have "redundancy." Their redundancy was a RAID 1 array. When the controller card failed and corrupted both drives simultaneously, their Mean Time To Recovery (MTTR) wasn't minutes. It was four days. In the high-velocity world of 2017, four days of downtime isn't an outage; it's bankruptcy.
With the rise of ransomware variants like Locky aggressive targeting Linux servers this year, and the ever-present threat of hardware failure, we need to stop treating DR as a checkbox and start treating it as warfare. Here is how to architect a survival strategy using Linux primitives, MySQL replication, and geographically distinct infrastructure.
The RTO/RPO Reality Check
Before we touch a single config file, you need to define two variables. If you can't answer these, stop reading and go ask your CTO.
- RPO (Recovery Point Objective): How much data are you willing to lose? An hour? A transaction?
- RTO (Recovery Time Objective): How fast must the service return?
If your boss says "zero data loss, zero downtime" but gives you a budget for a single shared hosting account, they are hallucinating. For a realistic robust setup on a Virtual Private Server (VPS), we aim for an RPO of < 5 minutes and an RTO of < 1 hour.
Step 1: The Off-Site Imperative (Geography Matters)
Data residing in a single datacenter is vulnerable to physical localized disasters. However, simply pushing data to Amazon S3 in the US brings us to the legal minefield. With the Safe Harbor agreement invalidated and the new Privacy Shield framework still facing scrutiny from European courts, keeping data within the EEA—and specifically Norway for local entities—is the only safe legal harbor.
You need a secondary location. If your primary node is in Oslo, your DR node should be in a different facility or at least a different availability zone. At CoolVDS, we see smart admins spinning up "warm standby" nodes. These are minimal instances that sit idle, receiving data replication, ready to scale up vertically the moment the primary node dies.
Step 2: Automating the "Heartbeat" Backup
Snapshots are great, but they are heavy. For granular recovery, you need file-level control. We are going to use rsync and SSH keys. Do not use FTP. It is 2017; if you are transferring data in cleartext, you are negligent.
First, establish a password-less SSH trust between your Production server and your CoolVDS DR instance:
# On Production Server
ssh-keygen -t rsa -b 4096
ssh-copy-id user@dr-node.coolvds.net
Now, implement a rotation script. This isn't just copying files; it's maintaining a history without consuming infinite space using hard links.
#!/bin/bash
# /usr/local/bin/dr_push.sh
SOURCE_DIR="/var/www/html/"
DEST_HOST="user@dr-node.coolvds.net"
DEST_DIR="/backup/"
DATE=$(date +%F)
# Ensure we have a directory for today
ssh $DEST_HOST "mkdir -p $DEST_DIR/$DATE"
# Sync with hard links to previous backup to save space
rsync -avz --delete --link-dest=$DEST_DIR/current \
$SOURCE_DIR $DEST_HOST:$DEST_DIR/$DATE/
# Update 'current' symlink on remote
ssh $DEST_HOST "rm -f $DEST_DIR/current && ln -s $DEST_DIR/$DATE $DEST_DIR/current"
Add this to your crontab. This script leverages the filesystem to minimize storage usage on your VPS while keeping daily snapshots accessible instantly.
Step 3: Database Replication (The Real Pain Point)
Files are easy. Databases are hard. Restoring a 50GB MySQL dump takes time—time you don't have during a disaster. The solution is Master-Slave replication.
In MySQL 5.7 (which you should be using over 5.5), setting up a slave on your DR node ensures that your data is already there when the fire alarm rings. You don't restore; you just promote the slave.
Primary Config (/etc/mysql/my.cnf):
[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_do_db = production_db
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1
Slave Config (DR Node):
[mysqld]
server-id = 2
relay-log = /var/log/mysql/mysql-relay-bin.log
read_only = 1
Pro Tip: Set read_only = 1 on your DR node. This prevents accidental writes from breaking consistency until you explicitly promote it to Master during a failover event.
The Storage Bottleneck: Why HDD Kills RTO
Here is the variable most people miss: Disk I/O. When you are restoring a backup or catching up on replication logs, your disk is hammered with write operations.
On traditional spinning rust (HDD), a restore of a 100GB dataset can take 4-6 hours due to random I/O latency. On NVMe storage, this drops to minutes. In 2017, NVMe is still considered a premium luxury by many hosting providers, often charging exorbitant fees for it. CoolVDS standardizes on NVMe for this exact reason. If your DR plan relies on cheap, rotational storage, your RTO calculations are likely off by a factor of ten.
Testing: The "Schrödinger's Backup"
A backup that hasn't been restored is neither dead nor alive—it is theoretical. You must test your DR plan.
Create a drill every quarter:
- Sever the network connection to the primary node.
- Run your failover script to promote the CoolVDS slave database.
- Update your DNS (low TTL is your friend here).
- Measure the time. Did you beat your RTO?
Conclusion
The threat landscape in Norway and Europe is shifting. We are seeing more sophisticated automated attacks and stricter data governance requirements from Datatilsynet. Your infrastructure needs to be resilient.
Don't wait for a kernel panic to realize your backup script failed three months ago. Build redundancy today.
Need a high-performance failover node? Deploy a low-latency NVMe VPS on CoolVDS in under 55 seconds and secure your data sovereignty.