Disaster Recovery Architectures for Norwegian Enterprises: Beyond Simple Backups
There is a fundamental misunderstanding in the server administration world that equates "backups" with "disaster recovery" (DR). They are not synonymous. A backup is a copy of your data. Disaster recovery is the strategy, infrastructure, and execution plan required to make that data accessible again when your primary data center goes dark. In the context of the Nordic hosting market—where we balance strict GDPR adherence (thanks to Datatilsynet) with the need for millisecond latency to NIX (Norwegian Internet Exchange)—a vague plan is a liability.
I recently audited a setup for a fintech client in Oslo who claimed to have a robust DR strategy. Their "strategy" was a nightly cron job dumping SQL to an AWS S3 bucket in Frankfurt. When we simulated a ransomware attack, their Recovery Time Objective (RTO) ballooned from the promised 4 hours to 36 hours. Why? Because pulling 4TB of data over the public internet and re-importing it into a standard HDD-based VPS takes time. Physics does not negotiate. This article details how to build a DR plan that actually works, utilizing 2024-era tools and high-performance infrastructure.
The Legal & Latency Equation: Why Geography Matters
Before touching the terminal, we must address the infrastructure layer. Post-Schrems II, moving personal data outside the EEA is legally hazardous. For Norwegian businesses, the safest DR site is a secondary location within Norway or a tightly integrated Nordic neighbor. This ensures compliance with the GDPR while keeping latency low for synchronous replication.
Pro Tip: When selecting a VPS for your DR site, ignore the marketing fluff about "cloud scalability" and look at the disk I/O. During a restoration, the bottleneck is almost always disk write speed. This is why we default to CoolVDS NVMe instances for DR targets; the high IOPS capability drastically reduces the time it takes to untar archives or replay database logs.
Defining RPO and RTO
You cannot configure a server without defining these metrics:
- Recovery Point Objective (RPO): How much data can you lose? (e.g., 5 minutes).
- Recovery Time Objective (RTO): How long until the service is online? (e.g., 1 hour).
If your CEO demands zero data loss (RPO=0), you need synchronous replication. If they accept 1 hour of loss, hourly snapshots suffice. The cost difference is logarithmic.
Component 1: The Database Layer (PostgreSQL Focus)
For a transactional workload, file-level backups are insufficient. We need Point-in-Time Recovery (PITR). Here is a standard configuration for a Primary node in postgresql.conf to enable WAL (Write Ahead Log) archiving, which allows you to replay transactions to the exact second of failure.
Configuring the Primary Node
Edit your configuration to enable archiving:
# /etc/postgresql/15/main/postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'rsync -a %p postgres@dr-node.coolvds.net:/var/lib/postgresql/wal_archive/%f'
max_wal_senders = 10
wal_keep_size = 512MB
This command pushes every completed transaction log to your CoolVDS DR instance immediately. However, for the standby node to take over, it needs a specific setup.
The Standby Signal
On your DR node (Running PostgreSQL 15+), you create a `standby.signal` file. This tells Postgres to start in recovery mode and read from the archives.
touch /var/lib/postgresql/15/main/standby.signal
Then, configure the `postgresql.conf` on the replica to read the incoming WAL files. This setup creates a "Warm Standby" that can be promoted to Primary in seconds.
# DR Node Configuration
restore_command = 'cp /var/lib/postgresql/wal_archive/%f %p'
recovery_target_timeline = 'latest'
hot_standby = on
Component 2: Filesystem Disaster Recovery with BorgBackup
For application files, configuration data, and static assets, `rsync` is good, but `BorgBackup` is superior due to deduplication, compression, and authenticated encryption. It effectively snapshots your filesystem without wasting space. A daily backup of a 100GB web server might only transfer 500MB of changes.
First, initialize the repository on the remote CoolVDS storage instance:
borg init --encryption=repokey user@dr-node.coolvds.net:/mnt/backups/web-01
Here is a battle-tested bash script to automate this. Place this in /usr/local/bin/run-backup.sh and chmod it to 700.
#!/bin/bash
# Fail on error
set -e
REPOSITORY="user@dr-node.coolvds.net:/mnt/backups/web-01"
# Backup all of /etc and /var/www, excluding logs and temp files
borg create --stats --progress \
$REPOSITORY::'{hostname}-{now:%Y-%m-%d_%H:%M}' \
/etc \
/var/www \
--exclude '/var/www/*/logs' \
--exclude '/var/www/*/tmp'
# Prune old backups: Keep 7 daily, 4 weekly, 6 monthly
borg prune -v $REPOSITORY \
--keep-daily=7 \
--keep-weekly=4 \
--keep-monthly=6
# Verify integrity of the archives
borg check $REPOSITORY
This script handles the creation of the archive and the rotation policy (retention). The set -e flag ensures the script stops immediately if a command fails, preventing a cascade of errors.
Component 3: Infrastructure as Code (Terraform)
In 2024, manual server provisioning is a relic. If your primary data center in Oslo goes offline, you should be able to spin up a replacement environment via API. While CoolVDS offers a robust dashboard, using a Terraform provider allows you to define your DR infrastructure as code.
Here is a theoretical HCL block for provisioning a high-memory node suitable for rapid recovery. Note the emphasis on specifying the NVMe storage type explicitly.
resource "coolvds_instance" "dr_node" {
name = "dr-oslo-01"
region = "no-oslo-1"
image = "debian-12"
size = "pro-nvme-16gb" # 16GB RAM, 4 vCPU
ssh_keys = [data.coolvds_ssh_key.devops.id]
# Tagging for automated ansible inventory
tags = [
"env:dr",
"role:database",
"compliance:gdpr"
]
# Post-provisioning setup
user_data = file("${path.module}/scripts/init-dr.sh")
}
Testing the Failover
A DR plan that hasn't been tested is just a hope. You must schedule "Game Days" where you simulate a failure. The process typically looks like this:
- Sever the connection: Block traffic to the primary IP using
iptables. - Promote the DB: On the DR node, run
pg_ctl promote. - Switch DNS: Update the A record to point to the CoolVDS DR IP.
To ensure your latency to the backup node is acceptable for local users, always verify the path. We want to see traffic routing through NIX or direct peering, not bouncing through Amsterdam.
mtr --report --report-cycles=10 dr-node.coolvds.net
If you see latency spikes above 15ms within Norway, check your routing tables or contact support. Low latency is critical when the DR node is live and serving customers.
Security Considerations
Your DR node contains a mirror of your production data. It must be as secure, if not more so, than production. Ensure your SSH keys are Ed25519 (RSA is getting long in the tooth) and restrict access.
ssh-keygen -t ed25519 -C "dr-access-2024"
Furthermore, ensure the authorized_keys file on the DR host has the correct permissions:
chmod 600 ~/.ssh/authorized_keys
Why KVM Virtualization Matters for DR
We specifically utilize KVM (Kernel-based Virtual Machine) at CoolVDS rather than container-based virtualization (like LXC/OpenVZ) for a specific reason: Resource Isolation. In a disaster scenario, you are likely restoring huge amounts of data. This is CPU and I/O intensive. On shared container platforms, a "noisy neighbor" can steal your CPU cycles, slowing your RTO. With KVM, the kernel ensures your allocated RAM and CPU resources are strictly yours.
Disaster recovery is not a product you buy; it is a discipline you practice. By combining immutable backups (Borg), streaming replication (Postgres), and high-performance infrastructure (CoolVDS NVMe), you turn a potential company-ending event into a manageable incident.
Don't wait for the inevitable hardware failure or cyberattack. Spin up a secondary NVMe instance on CoolVDS today and verify your recovery scripts before you actually need them.