Console Login

Disaster Recovery in 2017: Why Your "Backups" Are Not Enough

The 3:00 AM Reality Check

It is not a matter of if, but when. I have stood in server rooms where the silence was deafening because a primary RAID controller decided to commit suicide. I have watched CTOs weep over corruption that propagated to their backups because they never tested the restore process.

As we approach 2017, the landscape has shifted. The invalidation of the Safe Harbor agreement last year and the looming GDPR text (adopted this April) have made data location as critical as data integrity. If your DR (Disaster Recovery) plan relies on a bucket in Virginia, you are legally exposed. If your recovery time objective (RTO) is "whenever we download the tarball," you are commercially dead.

Real disaster recovery requires a Warm Standby. Here is how we build it using tools available today, focusing on data sovereignty within Norway.

The Architecture of Resilience

We are not talking about simple backups. We are talking about Business Continuity. The goal is to switch traffic from your primary site to a secondary node within Norway (or the EEA) in under 15 minutes.

Pro Tip: Network latency between Oslo and continental Europe is usually under 20ms via NIX (Norwegian Internet Exchange). This allows for near-synchronous replication without killing application performance. Do not try this with a server in Las Vegas.

1. The Foundation: KVM over Containers

While Docker is the current darling of the development world (version 1.12 brought Swarm mode, which is interesting), for a DR node, I insist on full isolation. Containers share a kernel. If a kernel panic hits your host, your recovery node dies too.

We use KVM (Kernel-based Virtual Machine). It provides hardware-level virtualization. If your neighbor on the physical host decides to mine bitcoins or fork-bomb the OS, your KVM instance remains stable. At CoolVDS, we enforce this isolation strictly; no OpenVZ overselling here.

2. Database Replication: GTIDs are Mandatory

Forget the old SHOW MASTER STATUS and log positions. That is fragile. In MySQL 5.7 (the current stable standard), you must use GTIDs (Global Transaction Identifiers). This makes failover and failback sane because the slave knows exactly which transactions it has executed, regardless of log file names.

Here is the critical configuration for your my.cnf on both the primary and the CoolVDS DR node:

[mysqld]
server_id = 2 # Set to 1 on Master, 2 on Slave
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW
# The critical 5.7 settings
gtid_mode = ON
enforce_gtid_consistency = ON
log_slave_updates = ON
master_info_repository = TABLE
relay_log_info_repository = TABLE
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1

The sync_binlog = 1 and innodb_flush_log_at_trx_commit = 1 flags are non-negotiable for ACID compliance. Yes, they cost I/O performance. This is why running on NVMe storage is essential. Spinning rust cannot handle the IOPS required for strict durability under load.

3. Efficient File Synchronization: Enter Borg

Rsync is great, but for DR, we need versioned snapshots that don't consume infinite space. BorgBackup has emerged this year as the superior alternative to Duplicity. It offers deduplication, compression, and authenticated encryption.

Instead of transferring the whole 50GB image gallery every night, Borg detects the changed chunks. This is crucial when replicating data across the WAN to your CoolVDS instance.

# Initialize the repository on the remote CoolVDS instance
borg init --encryption=repokey user@dr-node.coolvds.net:/var/backup/repo

# The daily backup script
borg create --stats --compression lz4 \
    user@dr-node.coolvds.net:/var/backup/repo::mn-{now:%Y-%m-%d} \
    /var/www/html \
    /etc/nginx

# Prune old backups automatically
borg prune -v --list --keep-daily=7 --keep-weekly=4 \
    user@dr-node.coolvds.net:/var/backup/repo

This script ensures you can roll back to yesterday's state if the disaster is "ransomware encrypted my files" rather than "the datacenter burned down."

Automating the Failover with Ansible

A manual recovery plan is a failed recovery plan. When adrenaline is high, you will make mistakes. We use Ansible (currently v2.1) to provision the DR node so it is identical to production.

Do not maintain the DR server manually. Define it as code. Here is a snippet to ensure your Nginx configuration is synchronized:

---
- hosts: dr_servers
  remote_user: root
  tasks:
    - name: Install Nginx
      apt: name=nginx state=present update_cache=yes

    - name: Sync Nginx Configurations
      synchronize:
        src: /etc/nginx/sites-available/
        dest: /etc/nginx/sites-available/
        delete: yes
      notify:
        - reload nginx

  handlers:
    - name: reload nginx
      service: name=nginx state=reloaded

The Legal Angle: Norway vs. The World

Data sovereignty is the headache of 2016. With the Privacy Shield agreement looking shaky to many legal scholars, hosting data outside the EEA is risky. The Norwegian Data Protection Authority (Datatilsynet) is notoriously strict.

By keeping your DR site in Norway, you bypass the US jurisdiction issues entirely. You also benefit from Norway's power grid—98% hydroelectric, stable, and cheap. It is the sensible place for cold or warm storage.

Testing: The Drill

A backup is Schrödinger's file: it exists and does not exist simultaneously until you verify it. Once a quarter, you must:

  1. Stop the replication slave on the CoolVDS node.
  2. Promote the slave to master.
  3. Point your hosts file (or a test DNS record) to the DR IP.
  4. Run a transaction.

If this takes more than 30 minutes, your scripts are too complex.

Conclusion

Hardware fails. Software has bugs. Humans delete databases. In 2016, relying on luck is negligence. You need a dedicated, isolated environment that respects data privacy laws and performs fast enough to take over production loads.

CoolVDS offers the NVMe I/O performance required for sync_binlog=1 and the KVM isolation needed for true stability. Don't wait for the crash.

Secure your business continuity. Deploy a warm standby node in Oslo today.