Disaster Recovery Strategies for 2018: Surviving the Outage in a GDPR World
It has been two months since May 25th. The panic emails about privacy policy updates have finally stopped clogging our inboxes. But while everyone in the Nordic tech scene was obsessing over Datatilsynet compliance and consent forms, many forgot a fundamental truth of systems engineering: Compliance is not availability.
You can be fully GDPR compliant and still go out of business in four hours if your primary database corrupts and your restoration time (RTO) is "sometime tomorrow." I have seen seasoned CTOs weep because their "backup strategy" was a nightly cron job that hadn't actually successfully verified a restore since 2016.
Today, we aren't talking about ticking boxes. We are talking about survival. We are going to architect a disaster recovery (DR) plan using tools available right now on Ubuntu 18.04 LTS and CentOS 7, focusing on keeping your data strictly within Norwegian borders to satisfy both latency demands and sovereignty laws.
The RPO/RTO Reality Check
Before we touch a single config file, you need to define two numbers. If you don't know them, you are guessing, not engineering.
- RPO (Recovery Point Objective): How much data are you willing to lose? One hour? One second?
- RTO (Recovery Time Objective): How long can you be offline?
For a static brochure site, a nightly tarball is fine. For a high-traffic Magento store or a SaaS platform running on a VPS in Norway, losing 24 hours of orders is unacceptable. We need near-real-time replication.
The Database: GTID Replication with MySQL 5.7
Forget the old way of binary log positions. If you are running serious infrastructure in 2018, you should be using Global Transaction Identifiers (GTIDs). It makes failover significantly less painful because slaves don't need to calculate log offsets manually.
Here is a battle-tested configuration for a Master node. This assumes you are running on dedicated resources—like a CoolVDS NVMe instance—where disk I/O won't become your bottleneck during heavy writes.
1. Master Configuration (/etc/mysql/mysql.conf.d/mysqld.cnf)
[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW
expire_logs_days = 7
max_binlog_size = 100M
# GTID Configuration for robust failover
gtid_mode = ON
enforce_gtid_consistency = ON
log_slave_updates = ON
# Safety first
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1
Pro Tip: Settingsync_binlog = 1andinnodb_flush_log_at_trx_commit = 1is the safest ACID-compliant setting, but it hits disk I/O hard. This is why we exclusively use NVMe storage at CoolVDS. On spinning rust (HDD), these settings will kill your write throughput. On our infrastructure, you won't feel it.
2. Creating the Replication User
Don't use root. Create a specific user with strictly limited privileges allowing only replication from your secondary IP.
mysql> CREATE USER 'repl_user'@'10.8.0.5' IDENTIFIED BY 'SuperSecurePassword2018!';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'10.8.0.5';
mysql> FLUSH PRIVILEGES;
Filesystem Strategy: BorgBackup
rsync is great, but for point-in-time recovery, it lacks versioning unless you mess around with hard links. In 2018, the tool of choice for serious admins is BorgBackup. It offers deduplication, compression, and authenticated encryption.
Why Borg? Because if your server gets hit with ransomware (an increasing threat this year), a standard rsync mirror just replicates the encrypted files, destroying your backup. Borg repositories can be append-only.
Installing and Initializing Borg
# On Ubuntu 18.04
sudo apt-get update && sudo apt-get install borgbackup
# Initialize the repo on your backup storage (Secondary CoolVDS instance)
borg init --encryption=repokey user@backup-server:/var/backups/repo.borg
The Daily Snapshot Script
Create a script at /usr/local/bin/backup.sh:
#!/bin/bash
REPOSITORY="user@backup-server:/var/backups/repo.borg"
# Backup everything in /var/www, excluding logs
borg create --stats --progress \
$REPOSITORY::{hostname}-{now:%Y-%m-%d} \
/var/www/html \
--exclude '*.log' \
--exclude '/var/www/html/cache'
# Prune old backups: Keep 7 dailies, 4 weeklies, 6 monthlies
borg prune -v $REPOSITORY \
--keep-daily=7 \
--keep-weekly=4 \
--keep-monthly=6
Network Level Failover
Having the data ready on a secondary server is useless if users are still hitting the dead IP address. In a complex setup, you might use BGP Anycast, but for most mid-sized deployments, DNS failover with a low TTL (Time To Live) is the pragmatic choice.
Set your A record TTL to 60 seconds. In the event of a catastrophe in the primary Oslo zone, you update the DNS record to point to your hot standby. Yes, there is propagation delay, but it beats the hours required to restore from cold storage.
Alternatively, if you are running a load balancer (like HAProxy or Nginx) in front of your app servers, you can configure a backup upstream.
Nginx Upstream Failover Config
upstream backend_cluster {
server 10.0.0.10:80 weight=5;
server 10.0.0.11:80 weight=5;
# The backup server only receives traffic if the primaries fail
server 10.0.0.20:80 backup;
}
server {
location / {
proxy_pass http://backend_cluster;
proxy_connect_timeout 2s;
}
}
The Sovereignty Factor: Why Location Matters
Under GDPR Article 32, you are required to implement "the ability to restore the availability and access to personal data in a timely manner." Furthermore, Norwegian businesses are increasingly wary of the US Cloud Act.
Hosting your primary and disaster recovery nodes within Norway (or the EEA) simplifies your legal posture immensely. There is no cross-border data transfer impact assessment needed if your data never leaves the region.
This is where CoolVDS fits the architectural requirement. We don't just offer "cloud." We offer distinct, isolated KVM instances sitting on hardware physically located here. You get the raw performance of NVMe—crucial for that database replication lag—without the noisy neighbor issues typical of container-based hosting.
The Fire Drill
A Disaster Recovery plan that hasn't been tested is just a theoretical document. It will fail you. Schedule a "Game Day." Pick a random Tuesday. Sever the network connection to your primary database. Watch your monitoring dashboards. See if the slave promotes correctly. See if your application handles the switch.
It will likely break the first time. That is good. Better it breaks when you are watching than at 3 AM on a Saturday.
Don't let legacy infrastructure or slow I/O be the reason your recovery fails. Performance is a safety feature. If you need a sandbox to test this replication setup, spin up a high-performance instance on CoolVDS today and see the difference NVMe makes to your recovery times.