When "Backups" Aren't Enough: A Pragmatic CTO's Guide to Norwegian DR
If your Disaster Recovery (DR) plan consists solely of a nightly cron job sending a tarball to an S3 bucket, you don't have a plan. You have a prayer. In the current regulatory climate—specifically looking at the aggressive stance of Datatilsynet following the Schrems II fallout—relying on US-controlled hyperscalers for your failsafe involves legal risk, not just technical risk.
We are not talking about "unleashing potential" here. We are talking about insurance. The difference between a 4-hour outage and a business-ending event usually comes down to two acronyms: RTO (Recovery Time Objective) and RPO (Recovery Point Objective).
This guide breaks down how to architect a sovereign, high-availability DR strategy using tools available right now, in late 2024, focusing on infrastructure within Norway's borders.
The Sovereign Cloud Imperative
Latency matters. Sovereignty matters more. When your primary data center in Oslo goes dark—whether due to a fiber cut or a ransomware encryption event—you cannot afford to wait for a support ticket to bounce between Dublin and Seattle.
Hosting locally on CoolVDS isn't just about hitting that sweet 2ms latency to NIX (Norwegian Internet Exchange). It's about data residency. If your recovery nodes are on a VPS in Frankfurt owned by a US entity, you are technically exporting data during a restore operation. Keep it in Norway. Keep it compliant.
Phase 1: Database Replication (The RPO Killer)
Nightly backups mean you are willing to lose up to 24 hours of data. For a high-transaction e-commerce site or a SaaS platform, that is unacceptable. We need to get RPO down to near-zero.
For this example, we assume you are running PostgreSQL 17 on Ubuntu 24.04 LTS. We will set up streaming replication to a warm standby server hosted on a separate CoolVDS instance.
First, ensure your primary server is configured for WAL (Write-Ahead Logging) archiving. Edit your postgresql.conf:
# /etc/postgresql/17/main/postgresql.conf
wal_level = replica
max_wal_senders = 10
wal_keep_size = 512MB
hot_standby = on
archive_mode = on
archive_command = 'test ! -f /mnt/wal_archive/%f && cp %p /mnt/wal_archive/%f'
Don't ignore wal_keep_size. If your standby falls too far behind (e.g., network partition), the primary might recycle the WAL segments the standby needs. 512MB is a conservative start; for high-write loads, push this to 2GB+.
Next, authorize the replication user in pg_hba.conf. Security tip: restricting this to the specific IP of your DR node is mandatory.
host replication rep_user 10.10.0.5/32 scram-sha-256
Pro Tip: CoolVDS offers private networking between instances. Use it. Never expose replication traffic over the public internet without a VPN or strict TLS restrictions. It reduces latency and eliminates bandwidth costs.
The Recovery Configuration
On your DR node (the standby), you don't need complex manual setups anymore. Use pg_basebackup to bootstrap:
# Run on the STANDBY CoolVDS node
sudo systemctl stop postgresql
sudo rm -rf /var/lib/postgresql/17/main/*
# Pull base backup from primary
sudo -u postgres pg_basebackup -h 10.10.0.2 -D /var/lib/postgresql/17/main -U rep_user -P -v -R -X stream
The -R flag is critical—it automatically generates the standby.signal file and connection settings. This turns a complex config job into a one-liner.
Phase 2: Immutable File Backups (The Ransomware Shield)
Replication copies bad data (like `DROP TABLE users;`) just as fast as good data. You still need snapshots. In 2024, Restic remains the gold standard for secure, efficient, deduplicated backups.
Why Restic? Because it supports backend locking. If an attacker gains root access to your server, they might try to wipe your backups. By using an append-only mode on your storage bucket or using Restic's REST server in append-only mode, you gain immutability.
Here is a robust backup script that handles encryption and retention policies:
#!/bin/bash
# /usr/local/bin/run_dr_backup.sh
export RESTIC_REPOSITORY="s3:https://s3.coolvds-storage.no/my-bucket"
export AWS_ACCESS_KEY_ID="key_id"
export AWS_SECRET_ACCESS_KEY="secret_key"
export RESTIC_PASSWORD_FILE="/root/.restic_pw"
# Initialize if not exists (one time)
# restic init
echo "Starting backup at $(date)"
# Back up /etc and /var/www, excluding logs to save space
restic backup /etc /var/www \
--exclude='/var/www/*/logs' \
--exclude='/var/www/*/tmp' \
--tag "scheduled_backup"
# Prune old snapshots: Keep last 7 days, 4 weeks, 6 months
restic forget \
--keep-daily 7 \
--keep-weekly 4 \
--keep-monthly 6 \
--prune
# Verify integrity of the repo (run this weekly, not daily due to IO cost)
if [ $(date +%u) -eq 7 ]; then
restic check
fi
Small operational note: Monitor the exit code of this script. echo $? after execution must be 0. If it's not, trigger a PagerDuty alert immediately.
Phase 3: Infrastructure as Code (Reducing RTO)
If your primary server hardware fails catastrophically, how fast can you redeploy? If you are clicking buttons in a UI, you are too slow. Use Terraform or Ansible.
We use CoolVDS's KVM architecture because it allows full kernel control, which is necessary for certain Docker optimizations and custom networking stacks that containers-on-bare-metal often struggle with. Here is how you define a recovery environment using a generic Terraform structure compatible with standard cloud-init providers:
resource "coolvds_instance" "dr_node" {
hostname = "dr-recovery-01"
plan = "nvme-16gb-4cpu"
location = "oslo-dc2"
image = "ubuntu-24-04-x64"
# Cloud-init to pre-install dependencies
user_data = <<-EOF
#!/bin/bash
apt-get update
apt-get install -y postgresql-17 nginx restic jq
systemctl enable nginx
EOF
tags = ["production", "dr", "standby"]
}
With this, you can spin up a fresh, dependency-ready environment in under 60 seconds. Combined with NVMe storage, the package installation bottleneck is virtually eliminated.
Testing: The "Scream Test"
A DR plan that hasn't been tested is a hypothesis. You need to perform a failover drill. Schedule a maintenance window, disconnect the primary network interface, and time how long it takes to:
- Promote the PostgreSQL standby:
pg_ctl promote -D /var/lib/postgresql/17/main - Switch DNS records (TTL should be low, e.g., 300 seconds).
- Verify application connectivity.
Check your DNS propagation time:
dig +short myapp.no @8.8.8.8
Why CoolVDS Fits the Norwegian Context
We built CoolVDS to solve a specific problem: high-performance hosting that doesn't compromise on sovereignty. When you are dealing with sensitive Norwegian customer data, you want the physical drive to be located in Oslo or nearby, subject to Norwegian law, not the CLOUD Act.
Furthermore, our infrastructure utilizes high-frequency CPUs and enterprise NVMe drives. In a recovery scenario, disk I/O is usually the bottleneck as logs replay and caches warm up. Slow disks mean slow recovery. We don't do slow.
Final check: Look at your current hosting invoice. If you are paying for "backup" services but don't know where the bits physically reside, you are failing your compliance audit. Fix it today. Spin up a Disaster Recovery node on CoolVDS and sleep better knowing your data isn't crossing the Atlantic.