Disaster Recovery in 2019: Beyond Backups for Norwegian Enterprises
Date: April 1, 2019
Most System Administrators lie to themselves. They look at a tar.gz file sitting in a bucket somewhere and tell their CTO, "Yes, we have a Disaster Recovery plan." No, you don't. You have a file. A file is not a recovery. Recovery is a process, and usually, it is a painful, adrenaline-fueled nightmare where you discover that your cheap VPS provider throttles disk I/O so hard that restoring your 500GB database will take three days.
I have stood in that server room (metaphorically, as we mostly manage KVM instances via SSH these days). I have watched a CEO scream about downtime while a progress bar crawled at 5MB/s. In 2019, with the GDPR fully enforceable and the Norwegian Datatilsynet watching closely, downtime isn't just an operational failure; it's a legal liability.
This guide is for the pragmatic architect. We will move beyond basic backups to true Disaster Recovery (DR) planning, focusing on automation, database replication, and why hardware choice—specifically NVMe—is the critical factor in Mean Time To Recovery (MTTR).
The "War Story": The Bottleneck of Cheap IOPS
Last year, I audited a Magento deployment for a client in Oslo. They suffered a catastrophic corruption of their ibdata1 file due to an unclean shutdown during a power failure at their previous budget host. They had backups. Good ones. But when we initiated the restore, the drive latency spiked to 400ms.
Their provider was overselling the spindle-based SAN. The restore process, which should have taken 2 hours, was projected to take 48. We migrated them to a CoolVDS instance with local NVMe storage mid-crisis. The restore finished in 55 minutes. Hardware matters.
Phase 1: The "3-2-1" Rule is Insufficient Without Encryption
You know the drill: 3 copies of data, 2 different media, 1 offsite. But in the post-GDPR era, if that offsite backup is unencrypted and lands on a server outside the EEA (or even on a US-owned cloud subject to the CLOUD Act of 2018), you are exposed.
Pro Tip: Never rely on provider-level snapshots alone. They are great for quick rollbacks, but if the provider's control plane goes down, your data is hostage. Always maintain an independent rescue hatch.
Automating the Lifeboat
We don't manually run backups. We script them. Here is a battle-tested Bash script that dumps a MySQL database, encrypts it with GPG, and ships it to a remote storage server (could be a secondary CoolVDS instance in a different geolocation).
#!/bin/bash
# /usr/local/bin/dr_backup.sh
# Battle-hardened backup script for 2019 deployments
TIMESTAMP=$(date +"%F")
BACKUP_DIR="/var/backups/sql"
MYSQL_USER="root"
MYSQL_PASS="sTr0ngP4ssw0rd!"
DB_NAME="production_db"
GPG_RECIPIENT="admin@example.no"
REMOTE_HOST="backup-user@192.0.2.10"
REMOTE_DIR="/home/backup-user/storage"
# Ensure directory exists
mkdir -p $BACKUP_DIR
echo "[+] Starting Dump for $DB_NAME..."
# 1. Dump with single-transaction to avoid locking tables (InnoDB)
mysqldump -u$MYSQL_USER -p$MYSQL_PASS --single-transaction --quick --routines --triggers $DB_NAME > $BACKUP_DIR/$DB_NAME-$TIMESTAMP.sql
if [ $? -eq 0 ]; then
echo "[+] Dump Successful. Encrypting..."
# 2. Encrypt using GPG for GDPR compliance
gpg --yes --batch --quiet --recipient $GPG_RECIPIENT --encrypt $BACKUP_DIR/$DB_NAME-$TIMESTAMP.sql
# 3. Remove the unencrypted text file immediately
rm $BACKUP_DIR/$DB_NAME-$TIMESTAMP.sql
echo "[+] Transferring to Remote Site..."
# 4. Rsync to remote disaster recovery site
rsync -avz -e "ssh -p 22" $BACKUP_DIR/$DB_NAME-$TIMESTAMP.sql.gpg $REMOTE_HOST:$REMOTE_DIR
# Cleanup local encrypted file to save space
rm $BACKUP_DIR/$DB_NAME-$TIMESTAMP.sql.gpg
echo "[+] Backup Complete."
else
echo "[!] Dump Failed! Check logs."
exit 1
fi
To automate this, add it to your crontab. Do not run it during peak Norwegian traffic hours (08:00 - 16:00 UTC+1).
# crontab -e
0 3 * * * /usr/local/bin/dr_backup.sh >> /var/log/dr_backup.log 2>&1
Phase 2: Minimizing RPO with Database Replication
Recovery Point Objective (RPO) is how much data you lose. A nightly backup means you can lose up to 24 hours of data. For an e-commerce store, that is unacceptable. The solution is Master-Slave replication.
In 2019, MySQL 5.7 and 8.0 offer robust GTID-based replication. This allows you to have a "Hot Standby" on a secondary server. If the primary melts, you switch the application config to the standby IP.
Configuring the Master (Primary)
Edit your /etc/mysql/my.cnf (or mysqld.cnf on Ubuntu 18.04 LTS):
[mysqld]
# Binds to private network IP for security
bind-address = 10.0.0.5
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
# GTID replication is far more robust than legacy log pos
gtid_mode = ON
enforce_gtid_consistency = ON
# Safety for durability
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1
Configuring the Slave (Hot Standby)
[mysqld]
bind-address = 10.0.0.6
server-id = 2
log_bin = /var/log/mysql/mysql-bin.log
gtid_mode = ON
enforce_gtid_consistency = ON
read_only = 1 # Crucial: Prevents accidental writes to the replica
With this setup, your RPO drops from 24 hours to milliseconds. However, replication is not a backup. If you run DROP TABLE on the master, it replicates to the slave instantly. You need both offsite cold backups (for corruption/deletion) and replication (for hardware failure).
Phase 3: Infrastructure as Code (IaC)
If your server is compromised, don't fix it. Kill it. Redeploy it. This is the immutable infrastructure paradigm. In 2019, Ansible is the tool of choice for this in many shops that find Puppet too heavy and Kubernetes too complex for simple setups.
Here is an Ansible playbook snippet that restores a web server state from scratch. This ensures that if you need to migrate to a new CoolVDS instance due to a region outage, you can do it in minutes, not hours.
---
- name: Disaster Recovery Provisioning
hosts: recovery_web
become: yes
vars:
nginx_port: 80
doc_root: /var/www/html
tasks:
- name: Install Nginx and PHP-FPM
apt:
name: ['nginx', 'php7.2-fpm', 'php7.2-mysql']
state: present
update_cache: yes
- name: Push Nginx Configuration
template:
src: templates/nginx.conf.j2
dest: /etc/nginx/sites-available/default
notify: Restart Nginx
- name: Ensure Firewall is Locked Down
ufw:
rule: allow
port: '{{ item }}'
proto: tcp
loop:
- '22'
- '80'
- '443'
handlers:
- name: Restart Nginx
service:
name: nginx
state: restarted
To run this against a freshly provisioned CoolVDS host:
ansible-playbook -i inventory/dr_hosts site.yml
The Norwegian Context: Latency and Legality
Why does geography matter in Disaster Recovery? Two reasons: Latency and Law.
1. Latency: When you are syncing terabytes of data back to a production server, the difference between a server in Oslo (CoolVDS) and a server in Frankfurt or Virginia is massive. Round-trip time (RTT) affects throughput via the TCP window size.
You can check your latency to the Norwegian Internet Exchange (NIX) to verify your host's connectivity:
mtr -rwc 10 nix.no
2. Law: Under GDPR, you are the Data Controller. If you use a US-based cloud provider for your DR, you must navigate the complexities of data transfer. Since the passing of the US CLOUD Act in 2018, US authorities can subpoena data held by US companies even if that data is physically located in Europe. Hosting with a strictly Norwegian provider like CoolVDS mitigates this specific jurisdictional risk, keeping your data strictly under Norwegian and EEA law.
Testing the Plan
A DR plan that hasn't been tested is a hallucination. You need to simulate a failure.
- Spin up a new CoolVDS instance (takes ~55 seconds).
- Run your Ansible playbook to configure the environment.
- Decrypt and restore your database backup.
- Point your hosts file to the new IP and verify the application loads.
While you are testing, benchmark the disk speed. If your provider can't handle the restore load, you need to switch.
# Test random write performance (simulating database restore)
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite
On a CoolVDS NVMe instance, you should see IOPS in the tens of thousands. On a standard VPS with shared spinning disks, you might see 300. That is the difference between being back online in 1 hour or 10 hours.
Conclusion
Disaster Recovery is expensive, unglamorous work—until the moment it saves your company. By combining automated encrypted offsite backups, Master-Slave replication, and infrastructure code, you build a fortress around your data.
But software is only half the equation. You need infrastructure that respects your need for speed and sovereignty. Don't wait for a catastrophic failure to find out your current host has slow pipes and even slower disks.
Take action today: Audit your current backup restoration speed. If it's too slow, deploy a test instance on CoolVDS and see what local NVMe storage does for your MTTR.