Disaster Recovery is a Lie Until You Test It: A DevOps Guide to Survival in Norway
It’s 3:00 AM on a Tuesday. Your monitoring dashboard lights up like a Christmas tree. The primary database node in Oslo just went dark. Is it a kernel panic? A fiber cut? Or did ransomware just encrypt your entire /var/lib/mysql directory?
If your heart rate just spiked, your Disaster Recovery (DR) plan is theoretical. If you sipped your coffee and opened a terminal, you have a functional DR strategy.
In the post-Schrems II era, simply dumping tarballs to an AWS S3 bucket in Virginia isn't just bad latency—it's potentially illegal for Norwegian businesses handling sensitive user data. We need to talk about sovereignty, RTO (Recovery Time Objective), and why the speed of your storage dictates whether you survive a catastrophe or close up shop.
The "3-2-1" Rule is Dead (Long Live 3-2-1-1-0)
You know the classic rule: 3 copies of data, 2 different media, 1 offsite. In 2022, with the sophistication of ransomware targeting backups specifically, that's insufficient. The modern standard we enforce at the infrastructure level is 3-2-1-1-0:
- 3 copies of data.
- 2 different media types (e.g., NVMe block storage and Object Storage).
- 1 offsite (Geographically separated, e.g., CoolVDS Oslo vs. a secondary location).
- 1 offline or immutable copy (Air-gapped or Object Lock).
- 0 errors after verification (Automated restoration tests).
Immutable Backups with BorgBackup
For Linux systems, rsync is not a backup solution; it's a file transfer tool. If a file is corrupted on source, rsync faithfully replicates the corruption. We prefer BorgBackup for its deduplication and encryption capabilities. It handles differential backups efficiently, which is critical when you are pushing gigabytes over the wire.
Here is a standard setup for creating an encrypted, compressed backup repository. This works flawlessly on Ubuntu 20.04/22.04 LTS environments:
# 1. Initialize the repo with encryption (Keyfile or Repokey)
borg init --encryption=repokey user@backup-server:/var/backups/coolvds-repo
# 2. Create a compressed backup archive
# We use lz4 for speed, but zstd is better for ratio if CPU permits
borg create --stats --progress --compression lz4 \
user@backup-server:/var/backups/coolvds-repo::{hostname}-{now:%%Y-%%m-%%d_%%H%%M} \
/etc \
/var/www/html \
/home \
--exclude '*.log' \
--exclude '/var/www/html/cache'
Pro Tip: Never store your encryption keys on the same server as the data. If the server is compromised, your backups are locked forever. Keep a copy in a password manager and a physical safe.
Database Consistency: The Silent Killer
File-level backups of a running database are useless. You will end up with corrupted tablespaces. For MySQL or MariaDB (common on our LEMP stacks), you need a consistent snapshot. While mysqldump is fine for small databases, it locks tables. For serious production workloads, use Percona XtraBackup (for MySQL) or Mariabackup.
This allows for hot backups without locking your tables—crucial for high-availability setups.
# Create a hot backup stream and compress it on the fly to a secure location
mariabackup --backup --stream=xbstream \
| gzip > /mnt/backup/full_backup_$(date +%F).xb.gz
Replication vs. Backup
Do not confuse replication with backup. If you run DROP TABLE users; on the master, that command replicates instantly to the slave. Replication provides High Availability (HA), not Disaster Recovery. However, a delayed slave node can be a lifesaver.
Configure a delayed replica in your my.cnf to give you a 1-hour window to stop a bad query from propagating:
[mysqld]
server-id = 2
relay-log = /var/log/mysql/mysql-relay-bin.log
log_bin = /var/log/mysql/mysql-bin.log
read_only = 1
# The magic setting: 1 hour delay
slave-delay = 3600
The Physics of RTO: Why Storage Speed Matters
Here is the brutal truth about recovery: Restoring data is IOPS-intensive.
Let’s say you need to restore 500GB of data.
On a standard SATA HDD (approx 100 MB/s sequential read), just reading that data takes ~1.5 hours. That doesn't account for random write penalties during database reconstruction, which can slow speeds to a crawl (5-10 MB/s).
On CoolVDS NVMe instances, we consistently see read/write speeds exceeding 2000 MB/s. That same 500GB restore could theoretically complete in under 5 minutes. When your e-commerce site is down, the difference between 5 minutes and 2 hours is the difference between a minor hiccup and a reputation-destroying event.
Comparison: Restore Time for 1TB Database
| Storage Type | Avg Speed | Est. Restore Time | Impact |
|---|---|---|---|
| Standard HDD VPS | 80-120 MB/s | ~3-4 Hours | High Churn Risk |
| SATA SSD | 400-500 MB/s | ~40 Minutes | Acceptable |
| CoolVDS NVMe | 2000+ MB/s | ~8-10 Minutes | Seamless |
Network Automation: The Failover Switch
When the primary site in Oslo goes down, you need to switch traffic to your warm standby. DNS propagation is too slow (TTL values are rarely respected by all ISPs). The professional approach involves a Floating IP (Reserved IP) or a load balancer health check.
If you are managing this manually with Nginx as a reverse proxy/load balancer, your upstream configuration should look like this to handle timeouts gracefully:
upstream backend_cluster {
# Primary Node (CoolVDS Oslo)
server 10.10.1.5:80 weight=10 max_fails=3 fail_timeout=30s;
# Standby Node (CoolVDS Secondary)
server 10.10.1.6:80 weight=1 backup;
}
server {
listen 80;
server_name example.no;
location / {
proxy_pass http://backend_cluster;
proxy_next_upstream error timeout http_500 http_502 http_503 http_504;
proxy_connect_timeout 2s;
}
}
This config ensures that if the primary node fails to respond within 2 seconds, Nginx immediately routes the request to the backup server. Your users might notice a slight lag, but they won't see a 502 Bad Gateway.
The "Schrödinger's Backup" Paradox
A backup that hasn't been restored is neither success nor failure—it exists in a quantum state of uncertainty. In my experience, 20% of untested backups fail during restoration due to corruption, missing keys, or dependency changes.
Automate the Drill:
Once a month, spin up a fresh CoolVDS instance. We have an API that lets you do this programmatically. Run a script that:
- Provisions a fresh VPS.
- Pulls the latest backup from your offsite repo.
- Restores the database and application code.
- Runs a
curlcheck against the local IP to verify the HTTP 200 OK response. - Destroys the instance.
This costs pennies in compute time but buys you absolute peace of mind.
Data Sovereignty and Datatilsynet
Operating in Norway means navigating strict privacy laws. Using US-based cloud giants for your primary disaster recovery site can complicate GDPR compliance, specifically regarding transfer mechanisms. By keeping your primary and DR sites within the EEA (or specifically within Norway using local providers), you simplify the legal landscape significantly.
CoolVDS infrastructure is built to respect these boundaries. We provide the raw compute and storage power you need to build compliant, resilient systems without the "noisy neighbor" issues common in oversold shared hosting environments.
Final Thoughts
Hope is not a strategy. Rsync is not a backup plan. And cheap storage is expensive when you are waiting for it to read data during an outage.
If you care about uptime, you need to care about the underlying metal. Don't let slow I/O kill your recovery time objectives. Deploy a high-performance NVMe test instance on CoolVDS today and see how fast a real server should be.