When the Lights Go Out in Oslo: A Realist's Guide to Disaster Recovery
It is 3:00 AM. Your phone buzzes. Nagios is screaming. Your primary database node just vanished from the network. If your stomach just dropped, it’s because you know your Disaster Recovery (DR) plan is a document collecting dust in a drawer, not a tested reality. I have seen seasoned sysadmins cry when they realize a SAN failure corrupted both their production data and the local snapshots they foolishly called "backups."
In the Nordic hosting market, we face unique challenges. We have excellent connectivity via NIX (Norwegian Internet Exchange), but we also deal with strict compliance requirements from Datatilsynet. With the recent invalidation of the Safe Harbor agreement last year, relying on US-based cloud buckets for disaster recovery is a legal minefield. You need your data on European soil, preferably right here in Norway.
Here is how to build a DR strategy that actually works, using tools available today in 2016 like MySQL 5.7, standard Bash scripting, and KVM virtualization.
The RPO/RTO Reality Check
Stop talking about "uptime." Talk about RPO (Recovery Point Objective) and RTO (Recovery Time Objective).
- RPO: How much data are you willing to lose? (Minutes? Hours?)
- RTO: How long can you stay offline before your CEO fires you?
If you are running a high-traffic Magento store or a SaaS platform, your RPO needs to be near-zero. That means nightly tarballs are useless. You need real-time replication.
Step 1: Real-Time Database Replication
Forget mysqldump for DR. If your database is 50GB, restoring a dump takes too long. You need a hot standby. We will use standard MySQL Master-Slave replication. It is robust, free, and works perfectly on CentOS 7.
Configuring the Master
On your primary server (let's say, running on a CoolVDS NVMe instance for high IOPS), edit your /etc/my.cnf. You must enable binary logging and set a unique server ID.
[mysqld]
server-id = 1
log_bin = /var/lib/mysql/mysql-bin.log
binlog_format = ROW
expire_logs_days = 7
max_binlog_size = 100M
# Reliability settings for InnoDB
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1
Pro Tip: Always usebinlog_format = ROWin 2016. Statement-based replication can break with non-deterministic queries (like those usingUUID()orNOW()). The overhead is worth the data integrity.
Configuring the Slave (DR Node)
Your DR node should be in a separate physical location. At CoolVDS, we ensure physical separation of host nodes for this exact reason. On the secondary server:
[mysqld]
server-id = 2
relay-log = /var/lib/mysql/mysql-relay-bin.log
read_only = 1
Setting read_only = 1 is critical. It prevents your application from accidentally writing to the backup database while it is in slave mode, which would break the replication chain immediately.
Step 2: File Synchronization with rsync
Databases are half the battle. What about user uploads? `rsync` is still the king here. Do not overcomplicate this with heavy distributed file systems like GlusterFS unless you really need the complexity. For 99% of setups, a scheduled rsync over SSH is sufficient.
Use this script structure to sync /var/www/html to your DR host:
#!/bin/bash
# /root/scripts/sync_dr.sh
SRC="/var/www/html/"
DEST="user@dr-host.coolvds.net:/var/www/html/"
EXCLUDE="--exclude 'cache/' --exclude 'logs/'"
# Bandwidth limit is polite if sharing pipes, but internal networks can go unlimited
# -a: archive mode, -v: verbose, -z: compress, -e: specify ssh
rsync -avz --delete $EXCLUDE -e "ssh -i /root/.ssh/id_rsa_dr" $SRC $DEST
if [ $? -eq 0 ]; then
echo "$(date) - Sync Successful" >> /var/log/dr_sync.log
else
echo "$(date) - Sync FAILED" >> /var/log/dr_sync.log
# Send alert via mail or Nagios
fi
Add this to your crontab. If you need near real-time, look into lsyncd (Live Syncing Daemon), which watches the kernel's inotify events and triggers rsync instantly. It works beautifully on our KVM slices.
Step 3: The Recovery Script
The worst time to figure out how to promote a slave to master is during an outage. Script it. Document it. Print it out on paper.
Here is a snippet of what a promotion sequence looks like for MySQL:
-- On the SLAVE server (the new Master)
STOP SLAVE;
RESET MASTER;
-- Disable read-only mode so apps can write
SET GLOBAL read_only = OFF;
-- Verify
SHOW VARIABLES LIKE 'read_only';
Why Infrastructure Choice Matters
Software configuration means nothing if the underlying virtualization is unstable. In the VPS market, we see a lot of providers over-selling OpenVZ containers. The problem? Resource contention.
In a disaster scenario, you are likely restoring huge amounts of data. This is I/O intensive. If you are on an oversold OpenVZ node, your "neighbors" will strangle your disk I/O, causing your RTO to skyrocket from 30 minutes to 6 hours.
This is why CoolVDS uses KVM (Kernel-based Virtual Machine). KVM provides hard hardware virtualization. Your RAM is yours. Your NVMe I/O is protected. When you hit the "Restore" button, the resources are actually there to execute the command.
The Legal Angle (Datatilsynet)
With the current regulatory climate in Europe, keeping your DR site within Norway or at least the EEA is not just a technical preference; it is becoming a compliance requirement. Moving data to a US server for backup might violate the rights of your Norwegian users under current interpretations of the Personal Data Act.
Final Thoughts
Disaster Recovery is expensive, boring, and invisible—until it saves your company from bankruptcy. Do not rely on luck. Do not rely on "cloud magic."
- Use MySQL Replication for data.
- Use rsync/lsyncd for files.
- Host on KVM infrastructure that guarantees I/O performance.
If you need a secondary node to act as your failover target, do not put it in the same rack as your primary. Deploy a CoolVDS KVM instance in our secondary zone today. It takes 55 seconds to spin up, which is less time than it takes to explain to your boss why the data is gone.