Console Login

Disaster Recovery in 2019: Beyond Backups for Norwegian Enterprises

Disaster Recovery in 2019: Beyond Backups for Norwegian Enterprises

Date: April 1, 2019

Most System Administrators lie to themselves. They look at a tar.gz file sitting in a bucket somewhere and tell their CTO, "Yes, we have a Disaster Recovery plan." No, you don't. You have a file. A file is not a recovery. Recovery is a process, and usually, it is a painful, adrenaline-fueled nightmare where you discover that your cheap VPS provider throttles disk I/O so hard that restoring your 500GB database will take three days.

I have stood in that server room (metaphorically, as we mostly manage KVM instances via SSH these days). I have watched a CEO scream about downtime while a progress bar crawled at 5MB/s. In 2019, with the GDPR fully enforceable and the Norwegian Datatilsynet watching closely, downtime isn't just an operational failure; it's a legal liability.

This guide is for the pragmatic architect. We will move beyond basic backups to true Disaster Recovery (DR) planning, focusing on automation, database replication, and why hardware choice—specifically NVMe—is the critical factor in Mean Time To Recovery (MTTR).

The "War Story": The Bottleneck of Cheap IOPS

Last year, I audited a Magento deployment for a client in Oslo. They suffered a catastrophic corruption of their ibdata1 file due to an unclean shutdown during a power failure at their previous budget host. They had backups. Good ones. But when we initiated the restore, the drive latency spiked to 400ms.

Their provider was overselling the spindle-based SAN. The restore process, which should have taken 2 hours, was projected to take 48. We migrated them to a CoolVDS instance with local NVMe storage mid-crisis. The restore finished in 55 minutes. Hardware matters.

Phase 1: The "3-2-1" Rule is Insufficient Without Encryption

You know the drill: 3 copies of data, 2 different media, 1 offsite. But in the post-GDPR era, if that offsite backup is unencrypted and lands on a server outside the EEA (or even on a US-owned cloud subject to the CLOUD Act of 2018), you are exposed.

Pro Tip: Never rely on provider-level snapshots alone. They are great for quick rollbacks, but if the provider's control plane goes down, your data is hostage. Always maintain an independent rescue hatch.

Automating the Lifeboat

We don't manually run backups. We script them. Here is a battle-tested Bash script that dumps a MySQL database, encrypts it with GPG, and ships it to a remote storage server (could be a secondary CoolVDS instance in a different geolocation).

#!/bin/bash
# /usr/local/bin/dr_backup.sh
# Battle-hardened backup script for 2019 deployments

TIMESTAMP=$(date +"%F")
BACKUP_DIR="/var/backups/sql"
MYSQL_USER="root"
MYSQL_PASS="sTr0ngP4ssw0rd!"
DB_NAME="production_db"
GPG_RECIPIENT="admin@example.no"
REMOTE_HOST="backup-user@192.0.2.10"
REMOTE_DIR="/home/backup-user/storage"

# Ensure directory exists
mkdir -p $BACKUP_DIR

echo "[+] Starting Dump for $DB_NAME..."

# 1. Dump with single-transaction to avoid locking tables (InnoDB)
mysqldump -u$MYSQL_USER -p$MYSQL_PASS --single-transaction --quick --routines --triggers $DB_NAME > $BACKUP_DIR/$DB_NAME-$TIMESTAMP.sql

if [ $? -eq 0 ]; then
    echo "[+] Dump Successful. Encrypting..."
    
    # 2. Encrypt using GPG for GDPR compliance
    gpg --yes --batch --quiet --recipient $GPG_RECIPIENT --encrypt $BACKUP_DIR/$DB_NAME-$TIMESTAMP.sql
    
    # 3. Remove the unencrypted text file immediately
    rm $BACKUP_DIR/$DB_NAME-$TIMESTAMP.sql
    
    echo "[+] Transferring to Remote Site..."
    
    # 4. Rsync to remote disaster recovery site
    rsync -avz -e "ssh -p 22" $BACKUP_DIR/$DB_NAME-$TIMESTAMP.sql.gpg $REMOTE_HOST:$REMOTE_DIR
    
    # Cleanup local encrypted file to save space
    rm $BACKUP_DIR/$DB_NAME-$TIMESTAMP.sql.gpg
    
    echo "[+] Backup Complete."
else
    echo "[!] Dump Failed! Check logs."
    exit 1
fi

To automate this, add it to your crontab. Do not run it during peak Norwegian traffic hours (08:00 - 16:00 UTC+1).

# crontab -e
0 3 * * * /usr/local/bin/dr_backup.sh >> /var/log/dr_backup.log 2>&1

Phase 2: Minimizing RPO with Database Replication

Recovery Point Objective (RPO) is how much data you lose. A nightly backup means you can lose up to 24 hours of data. For an e-commerce store, that is unacceptable. The solution is Master-Slave replication.

In 2019, MySQL 5.7 and 8.0 offer robust GTID-based replication. This allows you to have a "Hot Standby" on a secondary server. If the primary melts, you switch the application config to the standby IP.

Configuring the Master (Primary)

Edit your /etc/mysql/my.cnf (or mysqld.cnf on Ubuntu 18.04 LTS):

[mysqld]
# Binds to private network IP for security
bind-address = 10.0.0.5
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log

# GTID replication is far more robust than legacy log pos
gtid_mode = ON
enforce_gtid_consistency = ON

# Safety for durability
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1

Configuring the Slave (Hot Standby)

[mysqld]
bind-address = 10.0.0.6
server-id = 2
log_bin = /var/log/mysql/mysql-bin.log
gtid_mode = ON
enforce_gtid_consistency = ON
read_only = 1  # Crucial: Prevents accidental writes to the replica

With this setup, your RPO drops from 24 hours to milliseconds. However, replication is not a backup. If you run DROP TABLE on the master, it replicates to the slave instantly. You need both offsite cold backups (for corruption/deletion) and replication (for hardware failure).

Phase 3: Infrastructure as Code (IaC)

If your server is compromised, don't fix it. Kill it. Redeploy it. This is the immutable infrastructure paradigm. In 2019, Ansible is the tool of choice for this in many shops that find Puppet too heavy and Kubernetes too complex for simple setups.

Here is an Ansible playbook snippet that restores a web server state from scratch. This ensures that if you need to migrate to a new CoolVDS instance due to a region outage, you can do it in minutes, not hours.

---
- name: Disaster Recovery Provisioning
  hosts: recovery_web
  become: yes
  vars:
    nginx_port: 80
    doc_root: /var/www/html

  tasks:
    - name: Install Nginx and PHP-FPM
      apt:
        name: ['nginx', 'php7.2-fpm', 'php7.2-mysql']
        state: present
        update_cache: yes

    - name: Push Nginx Configuration
      template:
        src: templates/nginx.conf.j2
        dest: /etc/nginx/sites-available/default
      notify: Restart Nginx

    - name: Ensure Firewall is Locked Down
      ufw:
        rule: allow
        port: '{{ item }}'
        proto: tcp
      loop:
        - '22'
        - '80'
        - '443'

  handlers:
    - name: Restart Nginx
      service:
        name: nginx
        state: restarted

To run this against a freshly provisioned CoolVDS host:

ansible-playbook -i inventory/dr_hosts site.yml

The Norwegian Context: Latency and Legality

Why does geography matter in Disaster Recovery? Two reasons: Latency and Law.

1. Latency: When you are syncing terabytes of data back to a production server, the difference between a server in Oslo (CoolVDS) and a server in Frankfurt or Virginia is massive. Round-trip time (RTT) affects throughput via the TCP window size.

You can check your latency to the Norwegian Internet Exchange (NIX) to verify your host's connectivity:

mtr -rwc 10 nix.no

2. Law: Under GDPR, you are the Data Controller. If you use a US-based cloud provider for your DR, you must navigate the complexities of data transfer. Since the passing of the US CLOUD Act in 2018, US authorities can subpoena data held by US companies even if that data is physically located in Europe. Hosting with a strictly Norwegian provider like CoolVDS mitigates this specific jurisdictional risk, keeping your data strictly under Norwegian and EEA law.

Testing the Plan

A DR plan that hasn't been tested is a hallucination. You need to simulate a failure.

  1. Spin up a new CoolVDS instance (takes ~55 seconds).
  2. Run your Ansible playbook to configure the environment.
  3. Decrypt and restore your database backup.
  4. Point your hosts file to the new IP and verify the application loads.

While you are testing, benchmark the disk speed. If your provider can't handle the restore load, you need to switch.

# Test random write performance (simulating database restore)
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite

On a CoolVDS NVMe instance, you should see IOPS in the tens of thousands. On a standard VPS with shared spinning disks, you might see 300. That is the difference between being back online in 1 hour or 10 hours.

Conclusion

Disaster Recovery is expensive, unglamorous work—until the moment it saves your company. By combining automated encrypted offsite backups, Master-Slave replication, and infrastructure code, you build a fortress around your data.

But software is only half the equation. You need infrastructure that respects your need for speed and sovereignty. Don't wait for a catastrophic failure to find out your current host has slow pipes and even slower disks.

Take action today: Audit your current backup restoration speed. If it's too slow, deploy a test instance on CoolVDS and see what local NVMe storage does for your MTTR.