Console Login

Disaster Recovery in a Post-Schrems II World: A Norwegian CTO’s Playbook

Disaster Recovery is Your Only Real Insurance Policy

If you think your data is safe because it's in the cloud, you are operating on hope, not strategy. I've sat in boardrooms in Oslo explaining why a critical service was offline for 12 hours. It is not a conversation you want to have. The reality of 2023 is that hardware fails, fiber gets cut, and yes, datacenters burn down (the SBG2 incident is still fresh in our collective memory).

For Norwegian businesses, the stakes are compounded by Datatilsynet and the shadow of Schrems II. You cannot simply dump your encrypted snapshots into an AWS bucket in Virginia and call it a day. Data sovereignty is not a buzzword; it is a legal requirement.

This guide abandons the fluff. We are building a Disaster Recovery (DR) plan that prioritizes Recovery Time Objective (RTO) and Data Sovereignty, utilizing CoolVDS infrastructure as the backbone for a compliant, low-latency Norwegian recovery site.

The Architecture of Resilience

A robust DR strategy relies on the 3-2-1 rule: 3 copies of data, 2 different media, 1 offsite. In the context of Virtual Dedicated Servers (VDS), "media" usually translates to different storage backends or providers.

Pro Tip: Never replicate your production environment's errors. If you use `rm -rf /` on your master node, synchronous replication will happily delete the files on your backup node instantly. You need lagged replication or point-in-time recovery (PITR).

1. The Database Layer: PostgreSQL 15 Streaming Replication

Database consistency is the hardest part of DR. For a typical stack running on Ubuntu 22.04, we use PostgreSQL 15. We will set up a hot standby on a secondary CoolVDS instance. This allows for an RTO measured in seconds.

Primary Node Configuration (`postgresql.conf`):

listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
wal_keep_size = 512MB
# Enforce SSL for data in transit between Norwegian datacenters
ssl = on
ssl_cert_file = '/etc/ssl/certs/ssl-cert-snakeoil.pem'
ssl_key_file = '/etc/ssl/private/ssl-cert-snakeoil.key'

You must configure `pg_hba.conf` to allow the standby connection. Do not use open IP ranges. Restrict it specifically to your CoolVDS DR IP.

# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    replication     replicator      10.10.20.5/32           scram-sha-256

Standby Node Initiation:

Instead of manually copying data directories, use `pg_basebackup`. It handles the heavy lifting safely.

pg_basebackup -h production.coolvds.net -D /var/lib/postgresql/15/main \
    -U replicator -P -v -R -X stream -C -S dr_slot_1

The `-R` flag automatically generates the `standby.signal` file and connection settings. This is a vast improvement over the old manual configuration methods we dealt with in version 9.6.

2. The Filesystem Layer: Encrypted Snapshots with Restic

Replication covers the database, but what about your `var/www/html` or application assets? `rsync` is good, but `restic` is better for DR because it offers encryption by default, deduplication, and immutable snapshots.

Why `restic`? Because if your production server is compromised by ransomware, you don't want the malware syncing to your backup. Restic's snapshot model prevents overwriting history.

Automated Backup Script (`/opt/ops/backup.sh`):

#!/bin/bash
export RESTIC_REPOSITORY="sftp:user@backup.coolvds.net:/backups/app01"
export RESTIC_PASSWORD_FILE="/root/.restic_pw"

# Initialize if not exists
restic snapshots &> /dev/null || restic init

# Backup keeping strict retention policy
restic backup /var/www/html --tag production

# Prune old backups to manage storage costs
restic forget --keep-last 10 --keep-daily 7 --keep-weekly 4 --prune

Now, automate this with a Systemd timer. Cron is fine, but Systemd gives you better logging and dependency management. If the network is down, Systemd knows not to run the backup job.

# /etc/systemd/system/restic-backup.timer
[Unit]
Description=Run Restic Backup Daily

[Timer]
OnCalendar=*-*-* 03:00:00
RandomizedDelaySec=900
Persistent=true

[Install]
WantedBy=timers.target

Infrastructure as Code: The Recovery Switch

Having data is useless if you don't have a server to mount it on. Manual provisioning takes too long during a crisis. We use Terraform to define the CoolVDS resources. In a disaster event, you run `terraform apply`, and your infrastructure exists within minutes.

resource "coolvds_instance" "dr_node" {
  count        = var.disaster_mode ? 1 : 0
  region       = "no-oslo-2"
  plan         = "nvme-16gb"
  image        = "ubuntu-22.04"
  ssh_keys     = [var.admin_ssh_key]
  
  # Cloud-init to install dependencies immediately
  user_data = <

The "CoolVDS" Factor: Latency and Sovereignty

Why are we specific about the infrastructure provider here? It comes down to physics and law.

  1. Latency: If your primary market is Norway, your DR site should also be in Norway (or extremely close, like Sweden/Denmark), but on a different power grid/availability zone. CoolVDS offers distinct isolation within the Nordic region. Pinging from Oslo to Frankfurt adds latency that hurts database replication. Pinging Oslo to Oslo (different DC) keeps replication synchronous without performance penalties.
  2. KVM vs Containers: CoolVDS uses KVM (Kernel-based Virtual Machine). For a DR site, you need a full kernel. Container-based VPS solutions often share the kernel with the host. If a kernel panic hits the host node, your "isolated" container dies too. KVM provides the hardware abstraction necessary for true reliability.
  3. Compliance: Hosting on CoolVDS ensures your data remains under Norwegian jurisdiction. This simplifies your GDPR Article 32 documentation immensely.

Testing: Schrödinger's Backup

A backup that hasn't been restored does not exist. It is a quantum probability. You must test your recovery.

Every quarter, I execute a "Fire Drill":

  1. Spin up a fresh CoolVDS instance.
  2. Run the Terraform script.
  3. Restore the PostgreSQL base backup.
  4. Mount the Restic snapshot.
  5. Point a staging DNS record (e.g., `dr-test.yourdomain.no`) to it.
  6. Verify the application loads.

If this process takes more than 4 hours, your plan is failed. Refine the automation.

Conclusion

Disaster recovery is not about being pessimistic; it is about being professional. The cost of a standby NVMe instance on CoolVDS is negligible compared to the reputational damage of a 24-hour outage. By leveraging standard tools like PostgreSQL streaming replication and Restic, combined with local, compliant infrastructure, you build a safety net that actually holds.

Don't wait for the fire. Deploy your DR test node on CoolVDS today and verify your survival strategy.