Disaster Recovery in 2019: Architecture for When the Fjord Freezes Over
Letβs be honest: your current backup strategy is probably garbage. I say this not to be cruel, but because I have stood in a freezing server room at 3:00 AM trying to restore a MySQL dump that turned out to be truncated. Most sysadmins in Norway confuse "backups" with "Disaster Recovery" (DR). They are not the same beast.
A backup is a copy. DR is a plan and an architecture that keeps your business alive when your primary data center in Oslo goes dark. Whether it's a fiber cut near the Opera House or a kernel panic that cascades through your cluster, downtime costs kroner.
In this guide, we are ignoring expensive proprietary appliances. We are going to build a functional, low-latency DR site using standard Linux tools available right now in 2019. We will use PostgreSQL 11 Streaming Replication and Lsyncd for file synchronization, running on high-performance KVM slices.
The Latency Lie: Why Geography Matters
If your primary stack is hosted in Oslo, dumping your backups to a server in the same rack is suicide. dumping them to a server in San Francisco is latency murder.
For a hot-standby DR site, you need to be far enough away to avoid the same physical disaster, but close enough to keep replication lag under control. For Norwegian businesses, hosting on CoolVDS infrastructure offers a distinct advantage here: local peering via NIX (Norwegian Internet Exchange). We want ping times under 15ms between primary and secondary sites to ensure synchronous or near-synchronous replication doesn't stall the application.
ping -c 4 dr-site.coolvds.net
If you see triple digits here, stop. You need a better network topology.
Phase 1: The Database (PostgreSQL 11)
Database consistency is the hardest part of DR. We aren't using dumps; we are using WAL (Write Ahead Log) streaming. This creates an exact binary copy of your database in real-time.
Note: We are using PostgreSQL 11. If you are still on 9.6, upgrade. The replication slots introduced in 10 and refined in 11 are mandatory for robust setups.
1. Primary Configuration
On your primary CoolVDS instance (Master), edit your postgresql.conf. We need to tell Postgres to listen for connections and allow replication traffic.
# /etc/postgresql/11/main/postgresql.conf
listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
wal_keep_segments = 64 # Crucial for keeping logs if network jitters
hot_standby = on
Next, create a replication user. Do not use the `postgres` superuser for this.
CREATE USER replicator REPLICATION LOGIN ENCRYPTED PASSWORD 'CorrectHorseBatteryStaple';
Allow the DR IP in pg_hba.conf:
host replication replicator 10.10.20.5/32 md5
2. The Standby (DR) Configuration
On the secondary server, stop the postgres service. We need to wipe the data directory because we are about to pull a base backup.
systemctl stop postgresql
rm -rf /var/lib/postgresql/11/main/*
Now, we use pg_basebackup to clone the master. This is where NVMe storage shines. On a standard HDD VPS, this step takes forever. On CoolVDS NVMe instances, I've saturated 1Gbps links writing to disk without I/O wait.
pg_basebackup -h primary_ip -D /var/lib/postgresql/11/main -U replicator -P -v --wal-method=stream
Now, the most critical part for 2019-era Postgres: the recovery.conf file. This file tells Postgres, "I am not a master, I am a replica."
Pro Tip: Many tutorials forget `trigger_file`. This is a simple file path that, if created, tells the replica to promote itself to Master. Itβs the "Break Glass in Case of Emergency" button.
# /var/lib/postgresql/11/main/recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=primary_ip port=5432 user=replicator password=CorrectHorseBatteryStaple'
trigger_file = '/tmp/postgresql.trigger'
restore_command = 'cp /var/lib/postgresql/wal_archive/%f %p'
Start the service on the secondary:
systemctl start postgresql
Check your logs. You should see "started streaming WAL from primary". You now have a live replica.
Phase 2: Filesystem Synchronization
Databases are only half the battle. What about user uploads, config files, or static assets? `rsync` is great, but running it on a cron job leaves a gap where data is lost. We use lsyncd (Live Syncing Daemon). It watches the filesystem kernel events (inotify) and triggers rsync only when files change.
Install it:
apt-get install lsyncd
Configure it to watch your web root. This configuration ensures that if a user uploads a PDF in Oslo, it appears on the DR server seconds later.
-- /etc/lsyncd/lsyncd.conf.lua
settings {
logfile = "/var/log/lsyncd/lsyncd.log",
statusFile = "/var/log/lsyncd/lsyncd.status"
}
sync {
default.rsyncssh,
source = "/var/www/html",
host = "dr-site.coolvds.net",
targetdir = "/var/www/html",
delay = 5, -- Wait 5 seconds to bundle changes
rsync = {
archive = true,
compress = true,
_extra = { "--omit-dir-times" }
}
}
Ensure you have SSH keys set up between the hosts so `lsyncd` can connect without a password.
ssh-copy-id -i ~/.ssh/id_rsa.pub root@dr-site.coolvds.net
Phase 3: The Legal Shield (GDPR & Data Sovereignty)
We cannot discuss infrastructure in 2019 without addressing the elephant in the room: GDPR. Since May 2018, the rules have been clear. If you are processing data on Norwegian citizens, you are accountable for where that data lives.
Using US-based cloud giants for your DR site introduces legal complexity under the Privacy Shield framework, which is currently facing heavy scrutiny in European courts. By keeping your DR site on CoolVDS, you ensure data residency remains within the EEA (European Economic Area). This satisfies Datatilsynet requirements and simplifies your compliance documentation significantly. Low latency and legal safety often go hand-in-hand.
The Failover Procedure
Technology fails. When it does, your process must succeed. Here is the manual failover sequence if the primary site vanishes:
- Verify the outage: Confirm it's not just a route flap.
- Promote the DB: SSH into the DR unit and touch the trigger file.
touch /tmp/postgresql.trigger - Switch DNS: Update your A-records to point to the DR IP.
- Stop Lsyncd: Prevent the DR site from trying to sync back to a dead master.
This setup gives you an RPO (Recovery Point Objective) of near-zero and an RTO (Recovery Time Objective) of however long it takes your DNS to propagate.
Summary
Disaster recovery isn't about buying more hardware; it's about smart architecture. By leveraging KVM virtualization for isolation, NVMe for write-intensive replication, and open-source tools like Postgres and Lsyncd, you build a fortress.
Don't wait for a hardware failure to test this. Deploy a secondary instance on CoolVDS today, configure the replication, and sleep better knowing your data is safe on Viking soil.