Disaster Recovery in 2021: Why Your "Backups" Won't Save You When the Data Center Burns
March 2021 changed everything for European systems administrators. When the SBG2 data center in Strasbourg caught fire, the industry learned a brutal lesson: Availability Zones are not disaster recovery plans. If your "off-site" backup is just a snapshot in the same facility (or even the same provider's network), you don't have a backup. You have a coping mechanism.
I’ve spent the last decade recovering broken RAID arrays and mitigating DDoS attacks targeting Norwegian infrastructure. The reality is simple: Hardware fails, ransomware evolves, and sometimes, buildings physically burn down. The only metric that matters is RTO (Recovery Time Objective). If it takes you 48 hours to restore a 500GB database because you're pulling from cold storage on a spinning HDD, you've already failed.
This guide ignores the fluff. We are building a DR strategy that survives ransomware and physical destruction, compliant with Norwegian Datatilsynet requirements, using tools available right now in 2021.
The New Standard: 3-2-1-1-0
The old 3-2-1 rule (3 copies, 2 media types, 1 offsite) is insufficient against modern ransomware like DarkSide (the group behind the Colonial Pipeline hack just months ago). We need 3-2-1-1-0:
- 3 Copies of data
- 2 Different media types
- 1 Offsite (Geographically separated, e.g., if primary is Oslo, DR is Bergen or a distinct provider)
- 1 Offline/Immutable copy (Air-gapped or WORM storage)
- 0 Errors after verification
Pro Tip: Never use the same virtualization platform for your Primary and DR sites. If a hypervisor 0-day hits your primary VMware cluster, you want your DR running on KVM. This is why we advocate for CoolVDS as a DR target—our KVM implementation provides the kernel isolation necessary to prevent hypervisor escapes.
The Transport Layer: WireGuard is Mandatory
Stop using OpenVPN for site-to-site replication. It's too slow and the context switching kills CPU. In 2021, with Linux Kernel 5.6+ encompassing WireGuard natively, there is no excuse. You need low-latency throughput, especially if you are replicating across the NIX (Norwegian Internet Exchange).
Here is a standard, high-throughput WireGuard configuration for a DR receiver node. Note the MTU adjustments; if you are tunneling over PPPoE or quirky ISP routes, fragmentation will kill your replication speed.
DR Receiver Config (/etc/wireguard/wg0.conf)
[Interface]
Address = 10.200.0.1/24
SaveConfig = true
ListenPort = 51820
PrivateKey = <SERVER_PRIVATE_KEY>
# Optimize MTU for performance over standard ethernet frames
MTU = 1380
# Packet processing optimization
PostUp = iptables -A FORWARD -i wg0 -j ACCEPT; iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
PostDown = iptables -D FORWARD -i wg0 -j ACCEPT; iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE
[Peer]
PublicKey = <CLIENT_PUBLIC_KEY>
AllowedIPs = 10.200.0.2/32
Latency is the enemy of replication. Testing between a standard cloud provider and a CoolVDS instance in Oslo often shows a 5-10ms difference. In synchronous replication scenarios, that is the difference between a responsive app and a timed-out query.
The Storage Engine: BorgBackup
Rsync is fine for files, but for efficient, encrypted, deduplicated backups, BorgBackup is the industry standard. It handles "forever forward incremental" backups effortlessly.
Why Borg? Because it mounts backups as FUSE filesystems. When your CEO asks, "Can we get that one Excel file from last Tuesday?", you don't need to unzip a 50GB tarball. You just mount and copy.
Automated Backup Script
Here is a battle-tested wrapper script. It locks the repository to prevent concurrent runs and handles pruning automatically.
#!/bin/bash
# Configuration
REPOSITORY="borg@10.200.0.1:/mnt/dr-storage/repo"
LOG="/var/log/borg/backup.log"
# Prevent concurrency
if pidof -x borg > /dev/null; then
echo "Backup already running"
exit
fi
# Set encryption pass
export BORG_PASSPHRASE='Correct-Horse-Battery-Staple-2021'
echo "Starting backup at $(date)" >> $LOG
# Create backup
borg create --stats --progress --compression lz4 \
$REPOSITORY::'{hostname}-{now:%Y-%m-%d-%H%M}' \
/var/www/html \
/etc/nginx \
/var/lib/mysql_dumps \
--exclude '*.tmp'
# Prune old backups (Keep 7 daily, 4 weekly, 6 monthly)
borg prune -v $REPOSITORY \
--keep-daily=7 \
--keep-weekly=4 \
--keep-monthly=6
echo "Backup finished at $(date)" >> $LOG
Database Replication: The Real Challenge
Files are easy. Databases are hard. If you are running MySQL 8.0 (which you should be), use GTID-based replication. It makes failover and failback significantly less painful than the old binary log position method.
However, for a Disaster Recovery scenario where you might not need hot-standby (saving costs), a consistent dump streamed directly to the DR server is often safer against corruption. If you replicate a `DROP TABLE` command, your slave database drops the table instantly. A delayed dump saves you.
We use mypdumper (multi-threaded) for this, piping directly over SSH/WireGuard to the CoolVDS NVMe storage. NVMe is critical here. In 2021, restoring a 100GB dump on a standard SSD takes ~45 minutes. On NVMe, we clock it under 12 minutes.
Performance Tuning my.cnf for Recovery
When restoring, you need to disable safety checks to maximize write speed. Here is the config snippet we inject during a restore procedure:
[mysqld]
# CRITICAL for restore speed: disable disk sync on every commit
innodb_flush_log_at_trx_commit = 0
innodb_doublewrite = 0
# Increase buffer pool to 70% of RAM
innodb_buffer_pool_size = 8G
# Allow larger packet imports
max_allowed_packet = 1G
# Speed up index creation
innodb_write_io_threads = 16
innodb_log_buffer_size = 256M
The "Schrems II" Factor
We cannot talk about hosting in 2021 without addressing the elephant in the room: Schrems II. The CJEU ruling invalidated the Privacy Shield. If you are dumping backups of Norwegian user data to an AWS S3 bucket in us-east-1, you are likely non-compliant with GDPR.
This is not a technical problem; it's a legal one with technical solutions. Data residency is paramount. By utilizing a Norwegian provider like CoolVDS, you ensure data stays within the EEA, protected by Norwegian privacy laws, not subject to the US CLOUD Act.
The Infrastructure Reality Check
Why do backups fail? Usually, it's I/O wait. I've seen DR drills fail because the backup server was a cheap, oversold VPS with noisy neighbors. When you tried to write the restore data, the disk latency spiked to 500ms.
| Metric | Standard HDD VPS | SATA SSD VPS | CoolVDS NVMe |
|---|---|---|---|
| Random Read IOPS | ~150 | ~5,000 | ~50,000+ |
| Latency | 10-20ms | 1-2ms | <0.1ms |
| Restore Time (100GB) | 3+ Hours | 45 Mins | 12 Mins |
At CoolVDS, we don't oversell resources. When you start a restore, you get the full I/O throughput of the NVMe array. We use KVM to ensure your memory is yours, not shared in a container namespace.
Final Thoughts: Test or Fail
A backup is Schrödinger's file: it exists and doesn't exist until you attempt to restore it. Schedule a "Game Day." Shut down your production web interface. Point your DNS to the CoolVDS IP. Run the restore scripts.
If it works, you have peace of mind. If it fails, you fix it now, not when the CEO is breathing down your neck at 3 AM on a Sunday.
Don't let slow I/O be the reason your company fails. Deploy a dedicated DR instance on CoolVDS today, leverage our 10Gbps uplink to NIX, and ensure your data stays safe on Norwegian soil.