Disaster Recovery in 2014: Why "Backups" Are Not Enough for Norwegian Enterprises
Letβs be honest with ourselves. If your primary RAID array evaporated right now, how long would it take you to be back online? Not just "I have the data," but actual, serving-traffic-to-customers online. If your answer involves finding a tape drive or downloading 500GB over a congested FTP link, you don't have a Disaster Recovery (DR) plan. You have a prayer.
We are living in the immediate aftermath of the Heartbleed bug and the Snowden revelations. The trust model of the internet has shifted. For those of us managing infrastructure in Norway, the implementation of the Data Protection Directive (95/46/EC) and the specifics of the Personopplysningsloven mean we can't just dump encrypted blobs into an Amazon S3 bucket in Virginia and call it a day. We need data sovereignty, we need low latency, and we need raw, predictable I/O performance that doesn't vanish when a "noisy neighbor" decides to mine Bitcoins on the same physical host.
The "3-2-1" Rule is Just the Beginning
Most sysadmins know the drill: 3 copies of data, 2 different media, 1 offsite. But in a high-availability environment, restoring from a cold backup is a resume-generating event. It takes too long. In 2014, your DR strategy needs to shift from "Backup & Restore" to "Replication & Failover."
We need a Hot or Warm Standby. And for that, we need reliable infrastructure. This is where the choice of virtualization matters. At CoolVDS, we strictly use KVM (Kernel-based Virtual Machine). Unlike OpenVZ, where resources are often oversold and the kernel is shared, KVM provides true hardware virtualization. If my kernel panics, yours keeps humming. This isolation is non-negotiable for a DR site.
The Database: MySQL 5.6 and GTIDs
If you are still running MySQL 5.1 or 5.5, stop reading and upgrade. MySQL 5.6 introduced Global Transaction Identifiers (GTIDs), which makes failover infinitely less painful than the old binary log file/position method. With the old way, if a master died, figuring out exactly where the slave left off was a precarious game of log parsing.
Here is a battle-hardened configuration for a Master-Slave setup using GTIDs on Ubuntu 14.04 LTS (Trusty Tahr). This assumes you have two CoolVDS instances connected via a private network or secured tunnel.
1. Master Configuration (my.cnf)
[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
log_slave_updates = 1
expire_logs_days = 7
# GTID Settings for 5.6
gtid_mode = ON
enforce_gtid_consistency = true
master_info_repository = TABLE
relay_log_info_repository = TABLE
# Safety for durability
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1
2. The Replication User
mysql> CREATE USER 'repl_user'@'10.%.%.%' IDENTIFIED BY 'StrongPassword_2014!';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'10.%.%.%';
mysql> FLUSH PRIVILEGES;
Note the IP restriction. Never open port 3306 to the world. If you don't have a private VLAN, use iptables to whitelist only your DR node's IP.
Pro Tip: On a CoolVDS SSD node, you can be aggressive with innodb_io_capacity. Set it to 2000 or higher depending on your benchmarks. Standard spinning rust (HDDs) usually chokes above 200.
Filesystem Synchronization: Lsyncd over Rsync
Database replication handles the structured data, but what about user uploads, configuration files, and code? Cron jobs running rsync every hour means you can lose up to 59 minutes of data. That is unacceptable.
Enter Lsyncd (Live Syncing Daemon). It watches your local directory trees through inotify and spawns rsync processes to synchronize changes in near real-time. Itβs lightweight and robust.
Install it on your primary node:
apt-get install lsyncd
Configure /etc/lsyncd/lsyncd.conf.lua to push data to your CoolVDS DR instance:
settings {
logfile = "/var/log/lsyncd/lsyncd.log",
statusFile = "/var/log/lsyncd/lsyncd-status.log",
statusInterval = 20
}
sync {
default.rsync,
source = "/var/www/html/",
target = "root@dr-node.coolvds.net:/var/www/html/",
delay = 1,
rsync = {
compress = true,
archive = true,
verbose = true,
rsh = "/usr/bin/ssh -i /root/.ssh/id_rsa_dr -o StrictHostKeyChecking=no"
}
}
This setup ensures that the moment a customer uploads a file in Oslo, it's being pushed to your failover node. The delay = 1 aggregates events slightly to prevent an rsync storm if thousands of files change instantly.
The "Fire Drill": Testing Failover
A DR plan that hasn't been tested is just a hypothesis. You need to simulate a failure. Here is a simple checklist for your next maintenance window:
- Stop the webserver on the Primary Node.
- Promote the MySQL Slave to Master.
- Switch DNS (set TTL to 300 seconds beforehand!).
- Verify application connectivity.
To promote the slave in MySQL 5.6:
mysql> STOP SLAVE;
mysql> RESET MASTER;
# Update application config to point to localhost or the new IP
The Infrastructure Factor: Latency and Jurisdiction
Technical configuration is only half the battle. The physical location of your bits matters. Writing to a slave server in the US from Norway introduces roughly 100ms-140ms of latency. For synchronous replication, that kills performance. For async, it increases the "replication lag" window, meaning more data loss during a crash.
Keeping your DR site within Norway (or at least the Nordics) via the NIX (Norwegian Internet Exchange) ensures latency stays under 10-15ms. Furthermore, keeping data within the EEA satisfies the stringent requirements of the Datatilsynet. With the Patriot Act allowing US authorities to subpoena data from US-owned servers regardless of location, hosting on a sovereign Norwegian provider like CoolVDS offers a layer of legal insulation that your legal department will appreciate.
Performance Comparison: SSD vs HDD Restore
When disaster strikes, RTO (Recovery Time Objective) is king. Restoring a 50GB database dump on a traditional 7200RPM SATA drive can take hours due to IOPS bottlenecks. On CoolVDS Pure SSD arrays, we consistently see restore times slashed by 70-80%.
| Operation (50GB MySQL Import) | Traditional VPS (HDD) | CoolVDS (Pure SSD) |
|---|---|---|
| Sequential Read | ~120 MB/s | ~500 MB/s |
| Random Write (IOPS) | ~75-100 IOPS | ~10,000+ IOPS |
| Est. Restore Time | 45+ Minutes | < 12 Minutes |
Final Thoughts
Disaster recovery isn't about pessimism; it's about professionalism. Hardware fails. Software has bugs (we're looking at you, OpenSSL). The difference between a minor outage and a business-ending catastrophe is the work you put in today.
Don't rely on "best effort" hosting. Secure your data with KVM isolation and high-speed local replication. If you need a sandbox to test your new GTID replication setup, spin up a CoolVDS instance today. Your future self will thank you when the pager goes off at 3 AM.