Console Login

Disaster Recovery in 2017: Surviving the `rm -rf` Nightmare on Norwegian Soil

Disaster Recovery in 2017: Surviving the rm -rf Nightmare on Norwegian Soil

January 31st, 2017. A date that is burned into the memory of every sysadmin watching the GitLab live stream. A tired engineer accidentally ran a directory deletion command on the production database server instead of the secondary. 300GB of production data, gone. In seconds. If a tech giant can stumble, your setup can too. I have seen servers melt—literally catch fire—and I have seen file systems corrupted by a single rogue bit flip.

If you are hosting mission-critical applications in Norway, relying on "hope" is professional negligence. You need a plan that accounts for human error, hardware failure, and the specific legal landscape we operate in here, especially with the EU's GDPR enforcement looming next year. This is not about buying expensive appliances; it is about using robust, battle-tested Linux tools to ensure your Recovery Time Objective (RTO) isn't "next week."

The RPO/RTO Reality Check

Before we touch a single config file, define your metrics. RPO (Recovery Point Objective) is how much data you can afford to lose (minutes? hours?). RTO (Recovery Time Objective) is how long you can be offline.

Most cheap VPS providers oversell their storage infrastructure. When you start a heavy backup job, I/O latency spikes, your web app crawls, and the kernel starts killing processes. This is why we insist on NVMe storage at CoolVDS. When you need to dump a 50GB database, you cannot afford to wait for spinning rust. High IOPS (Input/Output Operations Per Second) are not a luxury; during a disaster recovery scenario, they are the difference between a 10-minute outage and a 4-hour ordeal.

The "3-2-1" Rule: Linux Edition

The golden rule remains: 3 copies of data, 2 different media, 1 off-site. In the context of a VPS Norway setup, "off-site" means a different physical datacenter, or at least a different host node.

1. The Database Dump (Hot vs. Cold)

For MySQL/MariaDB, mysqldump is standard but locks tables. For production loads, use Percona XtraBackup. However, if you are running a standard LAMP stack on Ubuntu 16.04, a scripted dump is your first line of defense.

Here is a robust Bash function to handle this, ensuring we capture routines and triggers:

mysqldump -u root -p --single-transaction --routines --triggers --all-databases | gzip > /backup/db_$(date +%F).sql.gz
Pro Tip: Never store backups on the same partition as your OS. Mount a separate volume. If your root filesystem gets corrupted during an update, your backups survive.

2. Efficient File Synchronization

Do not use FTP. It is 2017. Use rsync over SSH. It is incremental, meaning it only sends changes. This saves bandwidth and reduces the window of vulnerability.

A standard sync command might look like this:

rsync -avz -e ssh /var/www/html/ remote_user@backup_server:/var/backups/web/

But let's get serious. We want rotation. Below is a shell script structure I use for clients who need daily, weekly, and monthly retention without filling up the disk.

#!/bin/bash
# Simple Rotating Backup Script
# Date: 2017-03-29

SOURCE_DIR="/var/www/html"
BACKUP_ROOT="/mnt/backups"
DATE=$(date +%F)

# Create daily directory
mkdir -p $BACKUP_ROOT/daily/$DATE

# Sync using hard links to save space (deduplication style)
# If yesterday's backup exists, link against it
if [ -d "$BACKUP_ROOT/current" ]; then
  LINK_DEST="--link-dest=$BACKUP_ROOT/current"
else
  LINK_DEST=""
fi

rsync -avz --delete $LINK_DEST $SOURCE_DIR/ $BACKUP_ROOT/daily/$DATE/

# Update current pointer
rm -rf $BACKUP_ROOT/current
ln -s $BACKUP_ROOT/daily/$DATE $BACKUP_ROOT/current

# Log result
echo "Backup complete for $DATE" >> /var/log/backup_ops.log

Block-Level Replication with DRBD

File backups are great for RPO of 24 hours. But what if you need near-zero data loss? You need block-level replication. DRBD (Distributed Replicated Block Device) is essentially RAID-1 over the network. It mirrors data in real-time from your primary VPS to a secondary one.

Installing it on CentOS 7:

yum install -y drbd84-utils kmod-drbd84

This is where network latency matters. If you are replicating synchronously (Protocol C), write operations on the primary server are not considered "done" until the secondary confirms receipt. If your secondary server is in the US and your primary is in Oslo, your site will be sluggish. You need low latency connectivity. CoolVDS infrastructure is optimized for internal routing within the Nordics to make this viable.

Here is a configuration snippet for /etc/drbd.d/r0.res:

resource r0 {
  protocol C;
  startup {
    wfc-timeout  15;
    degr-wfc-timeout 60;
  }
  net {
    cram-hmac-alg sha1;
    shared-secret "SuperSecret2017";
  }
  on node1 {
    device /dev/drbd0;
    disk /dev/sdb1;
    address 10.0.0.1:7789;
    meta-disk internal;
  }
  on node2 {
    device /dev/drbd0;
    disk /dev/sdb1;
    address 10.0.0.2:7789;
    meta-disk internal;
  }
}

The Norwegian Context: Datatilsynet & Privacy

We are currently operating under the Personal Data Act, but the EU's General Data Protection Regulation (GDPR) is coming in May 2018. The clock is ticking. One key aspect of disaster recovery often ignored is Data Sovereignty.

If your DR plan involves backing up customer data to a cheap S3 bucket in a US-East region, you might be stepping into a legal minefield regarding the Privacy Shield framework. Keeping your primary and backup data within Norwegian borders—or at least the EEA—is the safest bet for compliance. CoolVDS ensures your data stays on local hardware, satisfying the requirements of even the strictest managed hosting contracts.

Testing: The "Schrödinger's Backup"

A backup does not exist until you have successfully restored from it. I have seen cron jobs that ran successfully for years, only to find out they were backing up an empty folder because of a permission error.

Automate your verification.

tar -tzf /backup/archive.tar.gz > /dev/null && echo "Archive valid" || echo "Corrupt"

For a more advanced check, use checksums to verify integrity before transfer:

sha256sum /var/www/html/config.php

If you are running a high-traffic site, you also need to consider DDoS protection as part of your availability plan. A volumetric attack can take you offline just as effectively as a disk failure. Ensure your provider filters traffic upstream before it hits your eth0 interface.

The Verdict

Disaster recovery isn't about buying a magic box. It is about scripts, discipline, and underlying hardware performance. You can write the best rsync scripts in the world, but if the host node is overloaded and disk I/O is choking, your recovery will fail when you need it most.

We built CoolVDS on KVM and NVMe specifically to handle these high-stress scenarios. We provide the raw horsepower and the stability; you provide the logic. Don't wait for the inevitable kernel panic to test your mettle.

Is your disaster recovery plan ready for 2018? Spin up a secondary NVMe instance on CoolVDS today and test your replication lag.