Console Login

Disaster Recovery in 2022: Why Your 'Backups' Are Not a Strategy

Disaster Recovery in 2022: Why Your 'Backups' Are Not a Strategy

It is 3:00 AM on a Tuesday. Your monitoring dashboard—Zabbix, Prometheus, or maybe just a frantic Slack message—lights up red. The primary database node in Oslo isn't responding. You try to SSH in. Connection timed out.

This is the moment where careers are either forged or destroyed. In 2022, disaster recovery (DR) isn't just about having a tarball of your /var/www directory. Between the evolving threat landscape of ransomware and the legal minefield of Schrems II, your DR strategy needs to be a precise, executable code path, not a vague policy document stored on the very server that just went down.

I have spent the last decade architecting systems across the Nordics, and I have learned one truth: Hope is not a valid configuration parameter.

The Legal Reality: Schrems II and Data Sovereignty

Before we touch a single line of code, we must address the elephant in the server room. Since the European Court of Justice struck down the Privacy Shield, relying on US-owned cloud giants for disaster recovery is a compliance risk. If you are handling Norwegian user data, mirroring that data to a bucket in us-east-1 is effectively a GDPR violation waiting to be fined by Datatilsynet.

Your DR site needs to be legally safe. This is why we see a mass migration back to sovereign providers like CoolVDS. You need the certainty that your failover node sits in a rack in Oslo or a compliant European jurisdiction, protected by Norwegian law, not subject to the US CLOUD Act.

The Technical Architecture: RTO vs. RPO

Two metrics matter:

  • RPO (Recovery Point Objective): How much data can you afford to lose? (e.g., "We can lose the last 5 minutes of transactions.")
  • RTO (Recovery Time Objective): How long can you be down? (e.g., "We must be back up in 1 hour.")

If you demand an RPO of zero, you need synchronous replication. However, synchronous replication adds latency. Physics is stubborn; the round-trip time (RTT) between data centers impacts your write speeds. This is where high-performance infrastructure becomes non-negotiable. Using CoolVDS NVMe instances reduces the I/O bottleneck on the disk side, allowing you to handle the network overhead of replication without killing application performance.

Strategy 1: The Immutable Backup (Ransomware Proofing)

Standard backups are useless against modern ransomware because the attacker will encrypt your backups along with your production data. You need immutable backups—data that cannot be modified or deleted, even by root, for a set retention period.

We use BorgBackup with append-only mode for this. It offers deduplication, compression, and authenticated encryption.

Implementation: Borg with Append-Only

On your CoolVDS backup server, initialize the repository:

# On the Backup Server (Destination)
borg init --encryption=repokey /mnt/backup-storage/repo

# Restrict SSH keys to only allow 'borg serve --append-only'
# Edit .ssh/authorized_keys
command="borg serve --append-only --restrict-to-path /mnt/backup-storage/repo",restrict ssh-rsa AAAAB3...

Now, run the backup from your production server. Even if the production server is compromised, the attacker cannot delete old archives from the backup server.

# On Production Server
export BORG_PASSPHRASE='StrongPassphrase'
borg create --stats --compression lz4 \
    backup-user@backup.coolvds.net:repo::db-$(date +%Y-%m-%d) \
    /var/lib/mysql /etc/nginx
Pro Tip: Do not just backup files. Backup the partition table and boot loader configurations. I’ve seen teams restore data only to spend 4 hours figuring out why GRUB won't load.

Strategy 2: Hot Standby with PostgreSQL 14

For critical applications, restoring from backup takes too long. You need a Hot Standby—a secondary server constantly replaying the WAL (Write-Ahead Logs) from the primary.

Historically, configuring PostgreSQL replication was a nightmare of recovery.conf files. In PostgreSQL 12 and later (we are assuming PG14 is your standard in 2022), it is streamlined via postgresql.conf and a standby.signal file.

Primary Node Configuration (postgresql.conf)

listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
wal_keep_size = 512MB  # Critical for network hiccups
hot_standby = on

Replica Node Setup

Wipe the data directory on the secondary CoolVDS instance and stream the base backup:

systemctl stop postgresql
rm -rf /var/lib/postgresql/14/main/*

# Run as postgres user
pg_basebackup -h primary_ip -D /var/lib/postgresql/14/main -U replicator -P -X stream -R

The -R flag automatically generates the standby.signal file and appends connection settings to postgresql.auto.conf. When the primary dies, you simply remove standby.signal and restart the replica. It promotes itself to primary instantly.

The Network Layer: IP Failover

Having the data ready is half the battle. Pointing users to it is the other half. DNS propagation is too slow (TTL can take hours). You need a Floating IP or a Load Balancer.

If you are managing your own cluster, Keepalived using VRRP (Virtual Router Redundancy Protocol) is the industry standard. It allows two Linux servers to share a virtual IP address.

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass SecretPassword
    }
    virtual_ipaddress {
        192.168.1.100
    }
}

However, VRRP requires Layer 2 network access, which many cloud providers block. This is where CoolVDS's network architecture shines—we support routed failover IPs, allowing you to switch traffic via API if the physical host fails.

Testing: The "Scream Test"

A DR plan that hasn't been tested is a hallucination. You must simulate failure.

  1. The Network Cut: Use iptables to drop all packets to your primary DB. Does the app switch?
  2. The Load Spike: Use wrk or siege to hammer the backup server. Does it have the CPU credits to handle production traffic?

Many VPS providers oversell CPU. Your backup node sits idle 99% of the time, so they throttle it. When you finally need it during a disaster, it chokes. We designed CoolVDS differently. Dedicated resources mean your backup node performs exactly like your production node.

Summary

Disaster recovery in 2022 requires navigating a complex matrix of European data laws, ransomware threats, and high-availability expectations. By leveraging tools like Borg for immutable history and PostgreSQL streaming replication for immediate failover, you build a fortress around your data.

But software is only as good as the hardware it runs on. When milliseconds matter, and data sovereignty is a legal requirement, you need a partner that understands the Nordic landscape.

Is your DR plan ready for reality? Spin up a secondary NVMe instance on CoolVDS today and run your first failover test. Sleep better tonight.