You are one `rm -rf` away from unemployment.
It’s 3:00 AM. Your phone buzzes. It’s not a text from a friend; it’s Nagios screaming that your primary database node is unreachable. You try to SSH in. Connection timed out. You check the hosting provider's status page. "Investigating connectivity issues in Oslo DC1."
If your stomach just dropped reading that, your Disaster Recovery (DR) plan is weak. If you shrugged and thought, "I'll just flip the DNS to the failover node," you can stop reading. For everyone else, we need to talk.
In the Nordic hosting market, we have a unique set of constraints. We have excellent connectivity via NIX (Norwegian Internet Exchange), but we also have strict compliance requirements from Datatilsynet. With the recent fall of Safe Harbor and the shaky ground of the new Privacy Shield, keeping data inside Norwegian borders isn't just about latency; it's about legal survival.
Here is how to architect a DR plan that survives hardware failure, human stupidity, and regulatory audits, using the standard stack available today in late 2016.
1. The Architecture: Active-Passive with a Warm Standby
For 99% of businesses running on a VPS, Active-Active setups (like multi-master MySQL clusters) are over-engineered suicide. They introduce write conflicts and latency issues that most CMS platforms (Magento, WordPress, Drupal) handle poorly.
Instead, we build a robust Active-Passive setup.
- Node A (Primary): Your high-performance CoolVDS NVMe instance. Handles all traffic.
- Node B (Standby): A smaller instance, perhaps in a different availability zone or datacenter (e.g., Oslo vs. Trondheim), continuously replicating data.
- Node C (Cold Storage): An encrypted BorgBackup repository for historical archives.
2. Database Replication: Moving to GTID
If you are still using classic file-position based replication in MySQL, stop. It’s fragile. If the master crashes, finding the exact log position is a nightmare. MySQL 5.6 introduced Global Transaction Identifiers (GTID), and in MySQL 5.7 (which you should be using on Ubuntu 16.04), it is stable and production-ready.
GTID ensures that every transaction has a unique ID. If Node A dies, Node B knows exactly what it missed without you calculating log offsets.
Configuration for `my.cnf` (add to both servers):
[mysqld]
server-id = 1 # Change to 2 on the slave
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW
gtid_mode = ON
enforce_gtid_consistency = ON
log_slave_updates = ON
expire_logs_days = 7
On the Slave (Node B), setting up the link is now trivial:
CHANGE MASTER TO
MASTER_HOST = '10.0.0.5',
MASTER_USER = 'repl_user',
MASTER_PASSWORD = 'StrongPassword123!',
MASTER_AUTO_POSITION = 1;
START SLAVE;
Pro Tip: Don't replicate over the public internet. It's slow and insecure. Use a private backend network if your provider supports it, or set up an OpenVPN tunnel between your CoolVDS instances. Latency between our nodes is typically <1ms, making replication virtually synchronous.
3. Filesystem Sync: Lsyncd is your friend
Replicating the database is useless if your user-uploaded images are missing. `rsync` on a cron job leaves a gap where data loss can occur (the time between cron runs). NFS is a single point of failure.
The solution is Lsyncd (Live Syncing Daemon). It watches the filesystem for changes (using `inotify`) and triggers `rsync` immediately when a file is modified.
Install it on Node A:
apt-get update && apt-get install lsyncd
Create ` /etc/lsyncd/lsyncd.conf.lua`:
settings {
logfile = "/var/log/lsyncd/lsyncd.log",
statusFile = "/var/log/lsyncd/lsyncd.status"
}
sync {
default.rsync,
source = "/var/www/html/uploads",
target = "root@10.0.0.6:/var/www/html/uploads",
rsync = {
compress = true,
archive = true,
verbose = true,
rsh = "/usr/bin/ssh -i /root/.ssh/id_rsa_backup"
}
}
This ensures that the moment a user uploads a PDF to Node A, it exists on Node B within seconds. Since CoolVDS uses pure SSD/NVMe storage, the I/O overhead of this monitoring is negligible.