Console Login

Disaster Recovery in 2016: Why Your "Backups" Won't Save You When the Server Melts

Disaster Recovery in 2016: Why Your "Backups" Won't Save You When the Server Melts

Let’s be honest. If your disaster recovery plan is a cron job running tar -czf sent to an FTP server in the basement, you don't have a plan. You have a prayer.

I recently audited a setup for a mid-sized e-commerce shop in Oslo. They claimed to have "full redundancy." Then I pulled the network cable on their primary database node. The result? Four hours of downtime. The "slave" wasn't consistent, the DNS TTL was set to 24 hours, and the sysadmin had to manually edit config files while the CEO breathed down his neck. Total chaos.

We are in late 2016. Ransomware like Locky is encrypting hard drives left and right. The Safe Harbor agreement is dead, replaced by the shaky "Privacy Shield." If you are hosting critical data for Norwegian customers, relying on US-based cloud giants is becoming a legal minefield. You need a strategy that is technically robust and legally compliant with the looming GDPR regulations everyone is whispering about.

The Cold Hard Truth: RAID Is Not Backup

Repeat after me: RAID protects against disk failure. It does not protect against data corruption, human stupidity, or `rm -rf /`.

To survive a real disaster—whether it's a data center power cut in Nydalen or a fat-fingered SQL injection—you need three things: Off-site replication, point-in-time recovery, and a low Time To Recovery (TTR). Here is the architecture I deploy on high-performance infrastructure like CoolVDS.

1. Database Replication with GTID (MySQL 5.7)

Forget the old binary log position method. It’s 2016. We use Global Transaction Identifiers (GTID) in MySQL 5.7. It makes failover significantly less painful because the slave knows exactly which transactions it has executed, regardless of binlog files.

Here is a snippet from a production my.cnf optimized for a master node running on an NVMe-backed instance. Note the innodb_flush_log_at_trx_commit setting; we set it to 1 for ACID compliance, but if you are on slow spinning rust, this will kill you. On CoolVDS NVMe storage, you won't feel the hit.

[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW

# GTID Configuration for MySQL 5.7
gtid_mode = ON
enforce_gtid_consistency = ON

# Safety & Performance
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1
innodb_buffer_pool_size = 4G # Adjust based on RAM
max_connections = 500

To set up the replication user securely:

CREATE USER 'repl_user'@'10.0.0.%' IDENTIFIED BY 'SuperSecur3Passw0rd!';
GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'10.0.0.%';
FLUSH PRIVILEGES;

If your replication latency (seconds_behind_master) spikes, it’s usually I/O starvation. This is where the underlying hardware matters. I've seen "cloud" providers throttle IOPS so hard that replication lags by hours. We choose CoolVDS because the KVM virtio drivers give us near-metal disk access speeds.

2. Efficient Filesystem Snapshots with BorgBackup

Rsync is fine for mirroring, but it doesn't give you history. If a hacker encrypts your files and you rsync them to your backup server, congratulations: you now have two encrypted servers.

Enter BorgBackup. It was released a while ago but hit version 1.0 earlier this year. It offers deduplication (saving massive space), compression, and authenticated encryption. It is the gold standard for Linux backups in 2016.

Here is a script I use to push backups from a CoolVDS web node to a remote storage box:

#!/bin/bash

# Environment variables for Borg
export BORG_PASSPHRASE='YourSecretPassphrase'
REPOSITORY='ssh://backup@storage.coolvds.com/./backups/web01'

# Create the backup
borg create -v --stats --compression lz4 \
    $REPOSITORY::{hostname}-{now:%Y-%m-%d_%H:%M} \
    /var/www/html \
    /etc/nginx \
    --exclude '*.log'

# Prune old backups (Keep 7 dailies, 4 weeklies)
borg prune -v $REPOSITORY \
    --prefix '{hostname}-' \
    --keep-daily=7 \
    --keep-weekly=4
Pro Tip: Always mount your backup storage with `noexec` and strict permissions. Better yet, use a "pull" mechanism where the backup server connects to the client, so if the web server is compromised, the attacker cannot wipe the backups.

3. The Network Switch: DNS Failover

In 2016, automated IP failover (VRRP/Keepalived) works great if your servers are in the same Layer 2 segment. But for true Disaster Recovery, your secondary site should be physically separated—perhaps one in Oslo and one in a secondary zone like Stockholm or Amsterdam (though for compliance, stay in Norway if possible).

Since BGP Anycast is too expensive for most setups, use DNS with a short TTL (Time To Live). Set your A records to 60 seconds.

$ dig +noall +answer coolvds.com
coolvds.com.        60  IN  A   185.x.x.x

When the primary node dies, a monitoring script (Nagios or Zabbix) detects the failure and uses the DNS provider's API to update the IP. It’s not instant—you have propagation delay—but it’s simple and robust.

The Norwegian Context: Why Location Matters

We need to talk about compliance. The Norwegian Data Protection Authority (Datatilsynet) is clear: you are responsible for your users' data. With the invalidation of Safe Harbor, moving personal data to US servers is risky business.

Hosting within Norway isn't just about latency (though 2ms pings from Oslo to a CoolVDS instance are nice). It's about legal sovereignty. If your servers are physically in Oslo, they are protected by Norwegian law, not subject to a subpoena from a foreign entity that claims jurisdiction over "the cloud."

Validating the Recovery

A backup is Schrödinger's cat: it exists and doesn't exist until you observe it. You must run restoration drills. Spin up a fresh Ubuntu 16.04 instance on CoolVDS, pull your Borg repo, and try to boot the app.

Here is a quick sanity check command to verify your MySQL dump integrity before you even try to restore it:

zcat dump.sql.gz | tail -n 1
# Output should be: -- Dump completed on 2016-11-24...

If the file is truncated, that last line will be missing. Simple checks save careers.

Conclusion

Hardware fails. Fiber gets cut. Sysadmins make mistakes. The difference between a minor hiccup and a business-ending event is the quality of your DR plan.

You need fast I/O for rapid database recovery. You need local presence for compliance. And you need a virtualization platform that doesn't oversubscribe your CPU when you are trying to decompress 50GB of backups.

Don't wait for the fire. Deploy a staging instance today, break it, and learn how to fix it. If you need a sandbox with genuine NVMe performance and Norwegian jurisdiction, spin up a CoolVDS instance and see the difference raw power makes.