Disaster Recovery for the Pragmatic CTO: Sovereign Data & Low RTO

Hope is not a strategy. If your entire continuity plan rests on a nightly tarball stored in an S3 bucket in Frankfurt, you aren't preparing for disaster; you are preparing for a résumé update. In the wake of Schrems II and the tightening grip of Datatilsynet (The Norwegian Data Protection Authority), the old playbook of "just push it to US-owned cloud storage" is legally radioactive.

We are operating in a reality where fiber cuts, hardware failure, and ransomware are statistical certainties. For Norwegian businesses, the challenge is dual: maintain low latency access via NIX (Norwegian Internet Exchange) while ensuring data never crosses borders it shouldn't.

This is not a high-level overview. This is a technical blueprint for architecting a Hot-Warm disaster recovery site using standard tools available in 2024: WireGuard, PostgreSQL 16 Streaming Replication, and ZFS. We will implement this assuming a primary site in Oslo and a failover node on a high-performance CoolVDS instance.

The Legal & Latency Imperative

Latency matters. Light travels fast, but routing protocols are slow. If your primary audience is in Scandinavia, failing over to a server in Virginia (us-east-1) introduces unacceptable lag, often exceeding 100ms. More importantly, it introduces compliance headaches.

Pro Tip: Under GDPR Article 32, you must demonstrate the ability to restore availability and access to personal data in a timely manner. "Timely" does not mean downloading 4TB of data at 50MB/s from a cold archive. It means failover.

By keeping your DR site on a Norwegian VPS provider like CoolVDS, you keep traffic local. Ping times between major Norwegian data centers often stay below 5ms. That is the difference between a noticeable outage and a blip.

Phase 1: The Secure Tunnel (WireGuard)

Do not expose your database replication ports to the public internet. IP whitelisting is fragile; VPNs are robust. In 2024, WireGuard is the undisputed king of kernel-space VPNs. It is faster than OpenVPN and far easier to audit than IPsec.

First, install tools on both Primary and DR nodes (Ubuntu 22.04 LTS assumption):

apt update && apt install wireguard -y

Generate keys on both servers:

wg genkey | tee privatekey | wg pubkey > publickey

Here is the configuration for the DR Node (CoolVDS) acting as the listener. We configure it to listen on port 51820.

# /etc/wireguard/wg0.conf on DR Node
[Interface]
Address = 10.10.0.2/24
SaveConfig = true
PostUp = ufw route allow in on wg0 out on eth0
PostDown = ufw route delete allow in on wg0 out on eth0
ListenPort = 51820
PrivateKey = 

[Peer]
PublicKey = 
AllowedIPs = 10.10.0.1/32

Bring up the interface:

wg-quick up wg0

You now have a private, encrypted lane between your production environment and your CoolVDS failover instance. Latency overhead is negligible due to WireGuard's crypto efficiency.

Phase 2: Database Replication (PostgreSQL 16)

RPO (Recovery Point Objective) is the amount of data you are willing to lose. With standard backups, RPO is 24 hours. With Streaming Replication, it is near-zero.

On the Primary Node, configure postgresql.conf for replication. We need to set the WAL (Write Ahead Log) level to logical or replica.

# postgresql.conf snippet
listen_addresses = 'localhost,10.10.0.1' 
wal_level = replica
max_wal_senders = 10
wal_keep_size = 512MB
hot_standby = on
archive_mode = on
archive_command = 'test ! -f /var/lib/postgresql/16/main/archive/%f && cp %p /var/lib/postgresql/16/main/archive/%f'

You must allow the DR IP in pg_hba.conf:

host replication rep_user 10.10.0.2/32 scram-sha-256

On the DR Node (CoolVDS), stop the service and clear the data directory. Then, pull the base backup:

systemctl stop postgresql rm -rf /var/lib/postgresql/16/main/*

Now, execute the base backup command utilizing the WireGuard tunnel:

pg_basebackup -h 10.10.0.1 -D /var/lib/postgresql/16/main -U rep_user -P -v -R -X stream -C -S dr_slot_1

The -R flag automatically generates the standby.signal file and connection settings. Start the service:

systemctl start postgresql

Verify status with:

sudo -u postgres psql -x -c "select * from pg_stat_wal_receiver;"

Your database is now replicating in real-time. If the primary site in Oslo goes dark due to a power failure, you can promote the CoolVDS instance to primary with a single command: pg_ctl promote.

Phase 3: Filesystem Replication with ZFS

Databases are vital, but what about user uploads, config files, and static assets? rsync is slow for millions of small files because it has to walk the filesystem tree. ZFS does not.

CoolVDS supports custom ISOs, allowing you to run ZFS-enabled distributions easily. ZFS Send/Receive works at the block level. It doesn't care about file names; it cares about changed bits.

First, snapshot the dataset on the Primary:

zfs snapshot zroot/data@backup_$(date +%F_%H-%M)

Then, send it to the DR node. This script logic is robust enough for cron jobs:

#!/bin/bash
# ZFS Replication Script

SOURCE_POOL="zroot/data"
DEST_USER="root"
DEST_HOST="10.10.0.2" # WireGuard IP
DEST_POOL="zroot/backup_data"

LAST_SNAP=$(zfs list -t snapshot -o name -S creation | grep $SOURCE_POOL | head -1 | cut -d'@' -f2)
NEW_SNAP="backup_$(date +%F_%H-%M)"

# Create new snapshot
zfs snapshot ${SOURCE_POOL}@${NEW_SNAP}

# Send incremental stream
zfs send -i @${LAST_SNAP} ${SOURCE_POOL}@${NEW_SNAP} | \ 
ssh ${DEST_USER}@${DEST_HOST} "zfs recv -F ${DEST_POOL}"

# Cleanup old snapshots (optional retention policy logic here)

This method is incredibly efficient. Even with massive changes, ZFS only sends the changed blocks. On a CoolVDS NVMe instance, the zfs recv operation is bottlenecked usually by the network, not the disk I/O.

Why Infrastructure Choice Matters

You can script the perfect DR plan, but if the underlying hardware at the DR site is oversubscribed, your RTO (Recovery Time Objective) blows up. When a disaster strikes, you aren't the only one scrambling.

Cheap VPS providers often steal CPU cycles (noisy neighbors) or throttle disk IOPS during sustained writes. When you are replaying 50GB of WAL logs or receiving a massive ZFS stream, you need sustained IOPS.

We built CoolVDS on KVM (Kernel-based Virtual Machine) specifically to avoid the resource contention inherent in container-based hosting (like OpenVZ or LXC). When you provision NVMe storage here, you get the throughput required to catch up a replication lag in seconds, not hours.

Testing the Failover

A DR plan that hasn't been tested is a hallucination. Schedule a "Game Day." Block port 5432 on your primary firewall to simulate an outage.

1. Check the logs on the CoolVDS instance.

2. Promote the database: /usr/lib/postgresql/16/bin/pg_ctl promote -D /var/lib/postgresql/16/main

3. Switch your DNS A-record (low TTL recommended) to the CoolVDS IP.

If your application comes up within 5 minutes, you have passed. If not, analyze the logs, tune the wal_receiver_timeout, and try again.

Don't wait for the next storm or fiber cut. Secure your data sovereignty and uptime today.

Ready to build a resilient infrastructure? Deploy a high-performance NVMe instance on CoolVDS and start syncing.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

Disaster Recovery Architectures for Norwegian Enterprises: Beyond Basic Backups

Disaster Recovery for the Pragmatic CTO: Sovereign Data & Low RTO

The Legal & Latency Imperative

Phase 1: The Secure Tunnel (WireGuard)

Phase 2: Database Replication (PostgreSQL 16)

Phase 3: Filesystem Replication with ZFS

Why Infrastructure Choice Matters

Testing the Failover

/// RELATED POSTS

The Death of the Perimeter: Architecting Zero-Trust Infrastructure in 2025

Container Security in 2025: Hardening Docker & Kubernetes for Production in Norway

Zero-Trust Architecture on Linux: Beyond the Firewall in 2025

Automating GDPR & CIS Compliance: A CTO’s Guide to Bulletproof Norwegian Infrastructure

Automating Security Compliance: Surviving the Datatilsynet Audit with Ansible & OpenSCAP

Automating the Auditor: Infrastructure-as-Code Compliance in the Post-Schrems Era