Disaster Recovery in the Schrems II Era: Architecting Resilience for Norwegian Data
There is a fundamental misunderstanding in the European hosting market today. Most CTOs believe that having a nightly tarball stored in an AWS S3 bucket constitutes a Disaster Recovery (DR) plan. It does not. That is merely an archive. A true DR strategy is about continuity and compliance, two factors that have become exponentially more difficult since the CJEU's Schrems II ruling.
If you are operating in Norway or handling EU citizen data, relying on US-owned cloud providers for your failover infrastructure introduces legal latency that is just as damaging as network latency. When the Datatilsynet (Norwegian Data Protection Authority) knocks on your door, "we store backups in Frankfurt" is no longer a valid defense against the US Cloud Act. You need data sovereignty.
This guide is not a high-level overview. We are going to architect a disaster recovery solution that keeps your RPO (Recovery Point Objective) near zero and your data strictly within Norwegian borders, utilizing standard Linux tooling available in 2023.
The RPO/RTO Trade-off: Defining the Cost of Downtime
Before writing a single line of config, you must define two variables:
- RPO (Recovery Point Objective): How much data can you afford to lose? (e.g., 5 minutes).
- RTO (Recovery Time Objective): How long can you be offline? (e.g., 1 hour).
Achieving RPO=0 requires synchronous replication, which introduces network latency to your write operations. If your primary server is in Oslo and your DR site is in Bergen, the speed of light is your constraint. On CoolVDS NVMe instances, we minimize the I/O bottleneck, but the network topology matters.
Pro Tip: Do not confuse High Availability (HA) with Disaster Recovery. HA protects against hardware failure (e.g., RAID 10 NVMe). DR protects against data center annihilation or ransomware. You need both.
Phase 1: Database Replication (PostgreSQL 15)
For a transactional system, a nightly dump is useless if you lose the server at 16:00. You need streaming replication. We will set up a Primary node (Active) and a Standby node (Passive) using PostgreSQL 15, which is the current gold standard for relational data integrity.
First, configure the Primary node. Edit your postgresql.conf to enable the Write Ahead Log (WAL) for replication:
# /etc/postgresql/15/main/postgresql.conf
listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
wal_keep_size = 512MB
hot_standby = on
archive_mode = on
archive_command = 'test ! -f /mnt/wal_archive/%f && cp %p /mnt/wal_archive/%f'
The archive_command is your safety net. If the replica falls too far behind, it can fetch logs from the archive path. Next, allow the replica to connect in pg_hba.conf:
host replication replicator 10.10.0.5/32 scram-sha-256
Now, on the Secondary (DR) node, we don't start from an empty DB. We pull the base backup. This is where CoolVDS's unmetered internal bandwidth becomes critical, as transferring 500GB over public internet is a security risk and a performance bottleneck.
# Run on the Standby Server (DR Node)
# Stop the service first
sudo systemctl stop postgresql
# Clear existing data
sudo rm -rf /var/lib/postgresql/15/main/*
# Pull base backup from Primary (10.10.0.4)
sudo -u postgres pg_basebackup -h 10.10.0.4 -D /var/lib/postgresql/15/main -U replicator -v -P -X stream -R
# Fix permissions
sudo chown -R postgres:postgres /var/lib/postgresql/15/main
sudo systemctl start postgresql
The -R flag automatically generates the standby.signal file and connection settings. You now have a real-time replica. If the Primary fails, you simply remove standby.signal and restart the service to promote it.
Phase 2: Immutable File System Snapshots
Databases are only half the battle. You have configuration files, uploads, and application code. While rsync is great, it is not ransomware-proof. If your primary server gets encrypted, rsync will dutifully replicate the encrypted garbage to your backup server.
We prefer ZFS snapshots for this. ZFS allows you to take atomic, point-in-time snapshots of your entire filesystem. Even if the live data is corrupted, the snapshot remains read-only.
Here is a robust Bash script used to automate ZFS snapshots and send them to a CoolVDS storage instance via SSH. This script implements a rotation policy (keeping the last 7 daily snapshots).
#!/bin/bash
# /usr/local/bin/zfs-offsite-backup.sh
POOL="rpool/data"
REMOTE_HOST="backup-user@10.20.0.5"
REMOTE_POOL="backup_pool/node-01"
DATE=$(date +%Y-%m-%d)
SNAP_NAME="@backup_$DATE"
# 1. Create a local snapshot
zfs snapshot $POOL$SNAP_NAME
# 2. Send snapshot to remote DR server (Incremental)
# We find the previous snapshot to send only the delta
PREV_SNAP=$(zfs list -t snapshot -o name -S creation -d 1 $POOL | grep @backup_ | head -n 2 | tail -n 1 | cut -d'@' -f2)
if [ -z "$PREV_SNAP" ]; then
# Initial send
zfs send $POOL$SNAP_NAME | ssh $REMOTE_HOST zfs recv -F $REMOTE_POOL
else
# Incremental send
zfs send -i @$PREV_SNAP $POOL$SNAP_NAME | ssh $REMOTE_HOST zfs recv -F $REMOTE_POOL
fi
# 3. Prune old snapshots (Local and Remote)
# Keep last 7 days
zfs list -t snapshot -o name -S creation -d 1 $POOL | grep @backup_ | tail -n +8 | xargs -n 1 zfs destroy -r
ssh $REMOTE_HOST "zfs list -t snapshot -o name -S creation -d 1 $REMOTE_POOL | grep @backup_ | tail -n +8 | xargs -n 1 zfs destroy -r"
echo "Backup $SNAP_NAME completed successfully."
This method is vastly superior to file-level backups because it operates at the block level. It is faster, preserves permissions perfectly, and reduces CPU stealing on the host.
Phase 3: Infrastructure as Code (IaC) for Rapid Recovery
If your server is destroyed, how fast can you provision a replacement? If you are manually installing Nginx and PHP via the terminal, your RTO will blow out to days.
By 2023 standards, you should be defining your infrastructure with Terraform. This ensures that your DR environment is configuration-identical to production. Here is a snippet defining a CoolVDS-compatible KVM instance structure:
# main.tf
terraform {
required_providers {
libvirt = {
source = "dmacvicar/libvirt"
version = "0.7.1"
}
}
}
provider "libvirt" {
uri = "qemu+ssh://root@10.10.0.1/system"
}
resource "libvirt_volume" "os_image" {
name = "debian-11-generic.qcow2"
pool = "default"
source = "https://cloud.debian.org/images/cloud/bullseye/latest/debian-11-generic-amd64.qcow2"
format = "qcow2"
}
resource "libvirt_domain" "dr_node" {
name = "dr-web-01"
memory = "4096"
vcpu = 2
network_interface {
network_name = "default"
}
disk {
volume_id = libvirt_volume.os_image.id
}
console {
type = "pty"
target_port = "0"
target_type = "serial"
}
}
This allows you to spin up a replacement node in under 60 seconds. Combined with Ansible for configuration management, you remove human error from the panic of a disaster scenario.
The Network Layer: IP Failover and DNS
Having the data ready is useless if users cannot reach it. In a disaster, DNS propagation (TTL) is the enemy. It can take hours for ISP caches to update www.yoursite.com to the new IP address.
The solution is a Floating IP or a Reserved IP that can be remapped via API. Alternatively, use a low TTL (Time To Live) on your DNS records:
defaults {
ttl = 60
}
However, setting TTL too low increases load on your nameservers. A better approach is using a load balancer in a different geographic zone (but still within EEA for compliance) that health-checks your origin. If Oslo-Zone-A goes dark, the balancer automatically routes traffic to Oslo-Zone-B.
Why CoolVDS for Disaster Recovery?
We built CoolVDS specifically to address the gap between cheap, unreliable VPS providers and expensive, non-compliant US hyperscalers. For a robust DR plan in 2023, you need:
- KVM Virtualization: Containers (OpenVZ/LXC) share a kernel. If the host kernel panics, your DR node dies too. KVM provides the hardware isolation necessary for true redundancy.
- NVMe Storage: Restoring a 1TB database from a spinning HDD backup takes 14 hours. On our NVMe arrays, it takes minutes.
- Norwegian Sovereignty: Our data centers are in Norway. Your data never crosses a border unless you explicitly configure it to. This simplifies your GDPR audit trail immensely.
Disaster recovery is not a product you buy; it is a discipline you practice. But practicing it on fragile infrastructure is a waste of time. Ensure your foundation is solid.
Is your infrastructure ready for the worst-case scenario? Audit your current RTO today, and then deploy a geo-redundant test environment on CoolVDS.