Disaster Recovery Protocols: Surviving the Meltdown in a Pre-GDPR World
It is 3:14 AM. Your phone buzzes. It’s PagerDuty. The primary database node in Oslo just went dark. No ping, no SSH. Just silence.
Do you panic? Or do you pour a coffee and execute ./failover.sh?
In the world of systems administration, hope is not a strategy. With the General Data Protection Regulation (GDPR) looming over us (May 25th is coming faster than you think), losing data isn't just an operational embarrassment anymore—it is an existential legal threat. If you are hosting customer data in Norway or serving EU clients, the days of "we'll just restore from last week's tape" are dead.
I have spent the last decade cleaning up after rm -rf / accidents and RAID controller failures. Today, we are going to build a Disaster Recovery (DR) plan that actually works in 2018, utilizing the raw I/O power of modern NVMe VPS infrastructure like CoolVDS to minimize your Recovery Time Objective (RTO).
The 3-2-1 Rule is Still King
Before we touch a single config file, let’s reiterate the golden rule of data persistence. If you don't follow this, stop reading and go fix it.
- 3 copies of your data.
- 2 different media types (e.g., NVMe block storage and cold Object Storage).
- 1 copy off-site (If your primary is in Oslo, your backup should be in Bergen or Frankfurt).
Most VPS providers in the Nordic market run on older SAS drives. They are cheap, but restoring 500GB of data from a spinning disk in 2018 takes hours. When every minute of downtime costs you money, disk speed is the bottleneck. This is why we deploy on CoolVDS—their standard NVMe storage cuts restoration time by a factor of ten compared to traditional hosting.
Step 1: Database Replication (MySQL 5.7)
The days of relying solely on nightly mysqldump are over. You need real-time replication. In MySQL 5.7 (the current stable standard), we use GTID (Global Transaction Identifier) based replication. It’s far more robust than the old binary log position method, which broke if you looked at it wrong.
Here is a production-ready my.cnf configuration for a Master node running on Ubuntu 16.04 LTS:
[mysqld]
# Basic Settings
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW
expire_logs_days = 7
max_binlog_size = 100M
# GTID Replication Settings (Crucial for auto-failover)
gtid_mode = ON
enforce_gtid_consistency = ON
log_slave_updates = ON
# Crash Safety
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1
On your Slave node (which should be on a separate physical host, preferably in a different zone), the config looks similar, but with server-id = 2 and read_only = 1.
To initialize the replication user securely:
CREATE USER 'repl_user'@'10.8.%' IDENTIFIED BY 'StrongPassword2018!';
GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'10.8.%';
FLUSH PRIVILEGES;
Pro Tip: Never replicate over the public internet without encryption. Use the private network interfaces provided by your host (CoolVDS offers isolated private VLANs) or tunnel the traffic through an SSH tunnel or OpenVPN. Sending SQL traffic over public IP is a security suicide mission.
Step 2: File Synchronization with lsyncd
Databases are half the battle. What about user uploads? If you are running a Magento shop or a WordPress cluster, you need those /wp-content/uploads synced instantly.
While rsync via cron is fine for nightly dumps, it introduces a "Recovery Point Objective" (RPO) gap. If you run the cron every hour, you risk losing 59 minutes of data.
Instead, use lsyncd (Live Syncing Daemon). It watches the filesystem for changes using kernel inotify and triggers rsync instantly.
-- /etc/lsyncd/lsyncd.conf.lua
settings {
logfile = "/var/log/lsyncd/lsyncd.log",
statusFile = "/var/log/lsyncd/lsyncd.status",
nodaemon = false,
}
sync {
default.rsyncssh,
source = "/var/www/html/uploads",
host = "10.8.0.5", -- The Private IP of your failover node
targetdir = "/var/www/html/uploads",
rsync = {
archive = true,
compress = true,
_extra = { "--bwlimit=5000" } -- Don't saturate the link
}
}
This setup ensures that a file uploaded at 14:00:01 is on your backup server by 14:00:03.
Step 3: The Virtualization Layer Matters
Not all VPSs are created equal. In the budget hosting market, you will find a lot of OpenVZ (container-based) virtualization. For a serious DR plan, OpenVZ is a liability.
Why? Kernel dependency. If the host kernel panics, your container dies. You also cannot modify kernel parameters needed for advanced filesystem snapshots or specialized VPN configurations.
This is why KVM (Kernel-based Virtual Machine) is the non-negotiable standard for professional infrastructure. CoolVDS uses KVM exclusively. This allows us to treat the VPS like a dedicated server. We can take block-level snapshots of the entire machine state.
Scripting Automated Backups with BorgBackup
If you haven't switched to BorgBackup yet, you are working too hard. It offers deduplication, compression, and authenticated encryption. It is vastly superior to a simple tarball.
Here is a script I use to push encrypted backups to a remote storage server:
#!/bin/bash
# /usr/local/bin/backup-borg.sh
export BORG_PASSPHRASE='CorrectHorseBatteryStaple'
REPOSITORY=ssh://borg@backup.coolvds.net/./repo
# Backup everything except temporary files
borg create -v --stats \
$REPOSITORY::'{hostname}-{now:%Y-%m-%d_%H:%M}' \
/etc /var/www /home /root \
--exclude '*.tmp' \
--exclude '/var/log'
# Prune old backups (Keep 7 daily, 4 weekly, 6 monthly)
borg prune -v $REPOSITORY \
--prefix '{hostname}-' \
--keep-daily=7 \
--keep-weekly=4 \
--keep-monthly=6
The Norwegian Context: Latency and Jurisdiction
Location is a feature. If your customers are in Oslo, hosting in a German datacenter adds 20-30ms of latency to every request. That might sound trivial, but in the age of Google’s "Speed Update" (announced just this month, January 2018), page load speed is a mobile ranking factor.
Furthermore, the Datatilsynet is clear about data sovereignty. While data can flow within the EEA, keeping your primary data and your first-stage recovery data within Norway simplifies your compliance posture significantly. It removes ambiguity regarding third-country transit.
CoolVDS infrastructure is physically located in Oslo. This guarantees sub-5ms latency to local ISPs like Telenor and Telia, ensuring that even during a failover event, your user experience remains snappy.
Validating the Plan
A backup is not a backup until you have restored it. I recommend a "Game Day" once a quarter.
- Spin up a fresh CoolVDS instance (takes about 55 seconds).
- Run your Ansible playbooks or restoration scripts.
- Verify data integrity (Check MySQL checksums).
- Measure the time. Did it take 4 hours? Why? Optimize.
Conclusion
The hardware you run on dictates the ceiling of your recovery speed. You can have the best scripts in the world, but if you are waiting on a shared 7200RPM SATA drive to re-hydrate 100GB of data, you are going to be offline for hours.
In 2018, NVMe storage is not a luxury; it is a requirement for high-availability setups. Don't let your infrastructure become the bottleneck in your disaster recovery plan.
Ready to harden your stack? Deploy a KVM-based, NVMe-powered instance on CoolVDS today and sleep through the next 3 AM alarm.