Disaster Recovery in 2019: Architecture for When the Fjord Freezes Over

Let’s be honest: your current backup strategy is probably garbage. I say this not to be cruel, but because I have stood in a freezing server room at 3:00 AM trying to restore a MySQL dump that turned out to be truncated. Most sysadmins in Norway confuse "backups" with "Disaster Recovery" (DR). They are not the same beast.

A backup is a copy. DR is a plan and an architecture that keeps your business alive when your primary data center in Oslo goes dark. Whether it's a fiber cut near the Opera House or a kernel panic that cascades through your cluster, downtime costs kroner.

In this guide, we are ignoring expensive proprietary appliances. We are going to build a functional, low-latency DR site using standard Linux tools available right now in 2019. We will use PostgreSQL 11 Streaming Replication and Lsyncd for file synchronization, running on high-performance KVM slices.

The Latency Lie: Why Geography Matters

If your primary stack is hosted in Oslo, dumping your backups to a server in the same rack is suicide. dumping them to a server in San Francisco is latency murder.

For a hot-standby DR site, you need to be far enough away to avoid the same physical disaster, but close enough to keep replication lag under control. For Norwegian businesses, hosting on CoolVDS infrastructure offers a distinct advantage here: local peering via NIX (Norwegian Internet Exchange). We want ping times under 15ms between primary and secondary sites to ensure synchronous or near-synchronous replication doesn't stall the application.

ping -c 4 dr-site.coolvds.net

If you see triple digits here, stop. You need a better network topology.

Phase 1: The Database (PostgreSQL 11)

Database consistency is the hardest part of DR. We aren't using dumps; we are using WAL (Write Ahead Log) streaming. This creates an exact binary copy of your database in real-time.

Note: We are using PostgreSQL 11. If you are still on 9.6, upgrade. The replication slots introduced in 10 and refined in 11 are mandatory for robust setups.

1. Primary Configuration

On your primary CoolVDS instance (Master), edit your postgresql.conf. We need to tell Postgres to listen for connections and allow replication traffic.

# /etc/postgresql/11/main/postgresql.conf
listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
wal_keep_segments = 64  # Crucial for keeping logs if network jitters
hot_standby = on

Next, create a replication user. Do not use the `postgres` superuser for this.

CREATE USER replicator REPLICATION LOGIN ENCRYPTED PASSWORD 'CorrectHorseBatteryStaple';

Allow the DR IP in pg_hba.conf:

host replication replicator 10.10.20.5/32 md5

2. The Standby (DR) Configuration

On the secondary server, stop the postgres service. We need to wipe the data directory because we are about to pull a base backup.

systemctl stop postgresql rm -rf /var/lib/postgresql/11/main/*

Now, we use pg_basebackup to clone the master. This is where NVMe storage shines. On a standard HDD VPS, this step takes forever. On CoolVDS NVMe instances, I've saturated 1Gbps links writing to disk without I/O wait.

pg_basebackup -h primary_ip -D /var/lib/postgresql/11/main -U replicator -P -v --wal-method=stream

Now, the most critical part for 2019-era Postgres: the recovery.conf file. This file tells Postgres, "I am not a master, I am a replica."

Pro Tip: Many tutorials forget `trigger_file`. This is a simple file path that, if created, tells the replica to promote itself to Master. It’s the "Break Glass in Case of Emergency" button.

# /var/lib/postgresql/11/main/recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=primary_ip port=5432 user=replicator password=CorrectHorseBatteryStaple'
trigger_file = '/tmp/postgresql.trigger'
restore_command = 'cp /var/lib/postgresql/wal_archive/%f %p'

Start the service on the secondary:

systemctl start postgresql

Check your logs. You should see "started streaming WAL from primary". You now have a live replica.

Phase 2: Filesystem Synchronization

Databases are only half the battle. What about user uploads, config files, or static assets? `rsync` is great, but running it on a cron job leaves a gap where data is lost. We use lsyncd (Live Syncing Daemon). It watches the filesystem kernel events (inotify) and triggers rsync only when files change.

Install it:

apt-get install lsyncd

Configure it to watch your web root. This configuration ensures that if a user uploads a PDF in Oslo, it appears on the DR server seconds later.

-- /etc/lsyncd/lsyncd.conf.lua
settings {
    logfile = "/var/log/lsyncd/lsyncd.log",
    statusFile = "/var/log/lsyncd/lsyncd.status"
}

sync {
    default.rsyncssh,
    source = "/var/www/html",
    host = "dr-site.coolvds.net",
    targetdir = "/var/www/html",
    delay = 5, -- Wait 5 seconds to bundle changes
    rsync = {
        archive = true,
        compress = true,
        _extra = { "--omit-dir-times" }
    }
}

Ensure you have SSH keys set up between the hosts so `lsyncd` can connect without a password.

ssh-copy-id -i ~/.ssh/id_rsa.pub root@dr-site.coolvds.net

Phase 3: The Legal Shield (GDPR & Data Sovereignty)

We cannot discuss infrastructure in 2019 without addressing the elephant in the room: GDPR. Since May 2018, the rules have been clear. If you are processing data on Norwegian citizens, you are accountable for where that data lives.

Using US-based cloud giants for your DR site introduces legal complexity under the Privacy Shield framework, which is currently facing heavy scrutiny in European courts. By keeping your DR site on CoolVDS, you ensure data residency remains within the EEA (European Economic Area). This satisfies Datatilsynet requirements and simplifies your compliance documentation significantly. Low latency and legal safety often go hand-in-hand.

The Failover Procedure

Technology fails. When it does, your process must succeed. Here is the manual failover sequence if the primary site vanishes:

Verify the outage: Confirm it's not just a route flap.
Promote the DB: SSH into the DR unit and touch the trigger file.
touch /tmp/postgresql.trigger
Switch DNS: Update your A-records to point to the DR IP.
Stop Lsyncd: Prevent the DR site from trying to sync back to a dead master.

This setup gives you an RPO (Recovery Point Objective) of near-zero and an RTO (Recovery Time Objective) of however long it takes your DNS to propagate.

Summary

Disaster recovery isn't about buying more hardware; it's about smart architecture. By leveraging KVM virtualization for isolation, NVMe for write-intensive replication, and open-source tools like Postgres and Lsyncd, you build a fortress.

Don't wait for a hardware failure to test this. Deploy a secondary instance on CoolVDS today, configure the replication, and sleep better knowing your data is safe on Viking soil.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

Disaster Recovery in 2019: Architecture for When the Fjord Freezes Over

Disaster Recovery in 2019: Architecture for When the Fjord Freezes Over

The Latency Lie: Why Geography Matters

Phase 1: The Database (PostgreSQL 11)

1. Primary Configuration

2. The Standby (DR) Configuration

Phase 2: Filesystem Synchronization

Phase 3: The Legal Shield (GDPR & Data Sovereignty)

The Failover Procedure

Summary

/// RELATED POSTS

Edge Computing in Norway: Architecting for Sub-5ms Latency in 2025

Kubernetes Networking Deep Dive: Optimizing Packet Flow for Low Latency in 2025

Surviving the Packet Storm: A Deep Dive into Kubernetes Networking & CNI Performance in 2025

Surviving the Millisecond War: Edge Computing Architectures for the Nordic Market

Kubernetes Networking Deep Dive: Why Your Packets Are Dropping in the Overlay

Serverless Without the Handcuffs: Implementing Private FaaS Patterns on High-Performance VDS in 2025