Disaster Recovery in 2022: Why Your 'Backups' Are Not a Strategy
It is 3:00 AM on a Tuesday. Your monitoring dashboardâZabbix, Prometheus, or maybe just a frantic Slack messageâlights up red. The primary database node in Oslo isn't responding. You try to SSH in. Connection timed out.
This is the moment where careers are either forged or destroyed. In 2022, disaster recovery (DR) isn't just about having a tarball of your /var/www directory. Between the evolving threat landscape of ransomware and the legal minefield of Schrems II, your DR strategy needs to be a precise, executable code path, not a vague policy document stored on the very server that just went down.
I have spent the last decade architecting systems across the Nordics, and I have learned one truth: Hope is not a valid configuration parameter.
The Legal Reality: Schrems II and Data Sovereignty
Before we touch a single line of code, we must address the elephant in the server room. Since the European Court of Justice struck down the Privacy Shield, relying on US-owned cloud giants for disaster recovery is a compliance risk. If you are handling Norwegian user data, mirroring that data to a bucket in us-east-1 is effectively a GDPR violation waiting to be fined by Datatilsynet.
Your DR site needs to be legally safe. This is why we see a mass migration back to sovereign providers like CoolVDS. You need the certainty that your failover node sits in a rack in Oslo or a compliant European jurisdiction, protected by Norwegian law, not subject to the US CLOUD Act.
The Technical Architecture: RTO vs. RPO
Two metrics matter:
- RPO (Recovery Point Objective): How much data can you afford to lose? (e.g., "We can lose the last 5 minutes of transactions.")
- RTO (Recovery Time Objective): How long can you be down? (e.g., "We must be back up in 1 hour.")
If you demand an RPO of zero, you need synchronous replication. However, synchronous replication adds latency. Physics is stubborn; the round-trip time (RTT) between data centers impacts your write speeds. This is where high-performance infrastructure becomes non-negotiable. Using CoolVDS NVMe instances reduces the I/O bottleneck on the disk side, allowing you to handle the network overhead of replication without killing application performance.
Strategy 1: The Immutable Backup (Ransomware Proofing)
Standard backups are useless against modern ransomware because the attacker will encrypt your backups along with your production data. You need immutable backupsâdata that cannot be modified or deleted, even by root, for a set retention period.
We use BorgBackup with append-only mode for this. It offers deduplication, compression, and authenticated encryption.
Implementation: Borg with Append-Only
On your CoolVDS backup server, initialize the repository:
# On the Backup Server (Destination)
borg init --encryption=repokey /mnt/backup-storage/repo
# Restrict SSH keys to only allow 'borg serve --append-only'
# Edit .ssh/authorized_keys
command="borg serve --append-only --restrict-to-path /mnt/backup-storage/repo",restrict ssh-rsa AAAAB3...
Now, run the backup from your production server. Even if the production server is compromised, the attacker cannot delete old archives from the backup server.
# On Production Server
export BORG_PASSPHRASE='StrongPassphrase'
borg create --stats --compression lz4 \
backup-user@backup.coolvds.net:repo::db-$(date +%Y-%m-%d) \
/var/lib/mysql /etc/nginx
Pro Tip: Do not just backup files. Backup the partition table and boot loader configurations. Iâve seen teams restore data only to spend 4 hours figuring out why GRUB won't load.
Strategy 2: Hot Standby with PostgreSQL 14
For critical applications, restoring from backup takes too long. You need a Hot Standbyâa secondary server constantly replaying the WAL (Write-Ahead Logs) from the primary.
Historically, configuring PostgreSQL replication was a nightmare of recovery.conf files. In PostgreSQL 12 and later (we are assuming PG14 is your standard in 2022), it is streamlined via postgresql.conf and a standby.signal file.
Primary Node Configuration (postgresql.conf)
listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
wal_keep_size = 512MB # Critical for network hiccups
hot_standby = on
Replica Node Setup
Wipe the data directory on the secondary CoolVDS instance and stream the base backup:
systemctl stop postgresql
rm -rf /var/lib/postgresql/14/main/*
# Run as postgres user
pg_basebackup -h primary_ip -D /var/lib/postgresql/14/main -U replicator -P -X stream -R
The -R flag automatically generates the standby.signal file and appends connection settings to postgresql.auto.conf. When the primary dies, you simply remove standby.signal and restart the replica. It promotes itself to primary instantly.
The Network Layer: IP Failover
Having the data ready is half the battle. Pointing users to it is the other half. DNS propagation is too slow (TTL can take hours). You need a Floating IP or a Load Balancer.
If you are managing your own cluster, Keepalived using VRRP (Virtual Router Redundancy Protocol) is the industry standard. It allows two Linux servers to share a virtual IP address.
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass SecretPassword
}
virtual_ipaddress {
192.168.1.100
}
}
However, VRRP requires Layer 2 network access, which many cloud providers block. This is where CoolVDS's network architecture shinesâwe support routed failover IPs, allowing you to switch traffic via API if the physical host fails.
Testing: The "Scream Test"
A DR plan that hasn't been tested is a hallucination. You must simulate failure.
- The Network Cut: Use
iptablesto drop all packets to your primary DB. Does the app switch? - The Load Spike: Use
wrkorsiegeto hammer the backup server. Does it have the CPU credits to handle production traffic?
Many VPS providers oversell CPU. Your backup node sits idle 99% of the time, so they throttle it. When you finally need it during a disaster, it chokes. We designed CoolVDS differently. Dedicated resources mean your backup node performs exactly like your production node.
Summary
Disaster recovery in 2022 requires navigating a complex matrix of European data laws, ransomware threats, and high-availability expectations. By leveraging tools like Borg for immutable history and PostgreSQL streaming replication for immediate failover, you build a fortress around your data.
But software is only as good as the hardware it runs on. When milliseconds matter, and data sovereignty is a legal requirement, you need a partner that understands the Nordic landscape.
Is your DR plan ready for reality? Spin up a secondary NVMe instance on CoolVDS today and run your first failover test. Sleep better tonight.