Console Login

When RAID Fails: A Paranoiac’s Guide to Disaster Recovery in 2014

When RAID Fails: A Paranoiac’s Guide to Disaster Recovery in Norway

Let’s be honest: if you are relying on RAID 10 to save your data, you are already dead. You just don't know it yet. I spent last Tuesday night rebuilding a corrupted partition table on a legacy server that supposedly had "redundancy." The hardware controller lied. The filesystem panic was real. In the world of systems administration, paranoia isn't a pathology; it's a job requirement.

Disaster Recovery (DR) isn't about buying more hard drives. It is about Mean Time To Recovery (MTTR). When your e-commerce site hosting Norwegian customers goes dark, every second of downtime bleeds kroner. With the recent focus on data sovereignty here in Norway (thanks to the Personopplysningsloven), you can't just dump encrypted tarballs into an Amazon S3 bucket in Virginia and hope for the best. You need a local, compliant, and technically rigorous plan.

The "3-2-1" Rule is Not Optional

The theory is simple: 3 copies of data, on 2 different media, with 1 off-site. But implementing this on a high-traffic LEMP stack (Linux, Nginx, MySQL, PHP) running on CentOS 6.5 requires precise tooling. We aren't dragging and dropping folders here; we are piping streams.

1. Database Consistency is King

Most junior admins run a cron job with mysqldump and call it a day. That works for your blog, but for a transactional database, table locking during a dump is unacceptable. It kills your I/O performance. For MySQL 5.5 or 5.6, we use Percona XtraBackup. It performs hot backups (non-blocking) for InnoDB tables.

Here is how we script a hot backup without stopping the service:

#!/bin/bash
# /opt/scripts/backup_mysql.sh

BACKUP_DIR="/backup/mysql/$(date +%F_%H-%M)"
LOG_FILE="/var/log/xtrabackup.log"

# Ensure the directory exists
mkdir -p $BACKUP_DIR

echo "Starting backup at $(date)" >> $LOG_FILE

# The --no-lock option is crucial for high availability
innobackupex --user=backup_user --password=COMPLEX_PASSWORD --no-lock --parallel=4 $BACKUP_DIR 2>> $LOG_FILE

if [ $? -eq 0 ]; then
    echo "Backup successful: $BACKUP_DIR" >> $LOG_FILE
    # Prepare the backup immediately so it's ready for restore
    innobackupex --apply-log $BACKUP_DIR
else
    echo "CRITICAL: Backup failed!" | mail -s "Backup Alert" admin@coolvds.com
fi

Notice the --apply-log step? Do that now, not when the server is burning down. It applies the transaction logs to the backup files, making them ready to drop in immediately.

Pro Tip: Always verify your MySQL configuration handles durability correctly. If you care about your data, check your my.cnf. We enforce innodb_flush_log_at_trx_commit = 1 on all CoolVDS managed database instances. It costs a tiny bit of write latency, but it ensures ACID compliance.

2. The Filesystem: Rsync is Still Unbeatable

Forget complex proprietary backup agents. rsync is installed on basically every Linux distribution since the 90s. It’s efficient, it handles sparse files, and it works over SSH. The goal is to mirror your configuration and web root to a secondary location—preferably a distinct physical location like a secondary CoolVDS datacenter in Oslo to maintain low latency on the private network.

Use this strict flag set to preserve permissions, ownership, and symlinks:

rsync -avzH --delete --exclude-from='/etc/rsync_exclude.txt' -e "ssh -p 2200" /var/www/ remote_user@10.0.0.5:/var/backups/webroot/

The --delete flag is dangerous but necessary to prevent your backup drive from filling up with files you deleted three years ago. Test your excludes file first.

Network Latency & Geography

Why does geography matter for DR? Bandwidth. If you need to restore 500GB of data, the pipe size determines your downtime. Pulling data from a US server to Norway over the public internet is going to be slow due to hops and peering congestion.

If your primary VPS is in Oslo, your backup should be in a separate facility in the Nordic region, connected via high-speed peering (NIX). CoolVDS infrastructure utilizes 10Gbps uplinks, meaning internal transfers between our recovery nodes happen at disk speed, not internet speed.

The "Warm Spare" Configuration

In 2014, we are seeing a shift from "Cold Spares" (backup tapes) to "Warm Spares" (a running, smaller VPS ready to scale up). Here is a simplified Nginx failover configuration. This sits on a load balancer (HAProxy or Nginx) in front of your stack:

upstream backend_hosts {
    server 192.168.1.10:80 weight=5 max_fails=3 fail_timeout=30s;
    # The backup server marks as 'backup' so it only takes traffic if the primary is dead
    server 192.168.1.20:80 backup;
}

server {
    listen 80;
    server_name example.no;

    location / {
        proxy_pass http://backend_hosts;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

This config ensures that if your primary node hits a kernel panic, traffic automatically flows to the backup node. You might run the backup node on lower resources to save costs, then vertically scale it up via the CoolVDS panel if a disaster strikes.

The KVM Advantage in Recovery

We see a lot of budget hosts pushing OpenVZ. For Disaster Recovery, OpenVZ is a nightmare. It shares a kernel with the host. If the host kernel updates or has a module conflict, your container breaks.

At CoolVDS, we strictly use KVM (Kernel-based Virtual Machine). This treats your VPS like a dedicated server. You can install your own kernel, load your own modules, and most importantly, mount disk images directly. If your OS gets corrupted, we can mount your disk image to a rescue VM, allowing you to chroot in and fix the mess. You cannot do that easily with shared-kernel virtualization.

Verifying the Archive

A backup you haven't restored is just a rumor. Once a month, you must run a drill. Spin up a fresh CentOS 6 minimal instance on CoolVDS (takes about 55 seconds), and run a restore script.

# /opt/scripts/restore_test.sh

# 1. Stop services
service mysqld stop
service nginx stop

# 2. Restore DB
rm -rf /var/lib/mysql/*
innobackupex --copy-back /backup/mysql/latest/
chown -R mysql:mysql /var/lib/mysql
service mysqld start

# 3. Verify Integrity
mysqlcheck -u root --all-databases --check

if [ $? -eq 0 ]; then
  echo "Drill Passed."
else
  echo "Drill Failed. Fix immediatey."
fi

Compliance and the "Datatilsynet" Factor

Operating in Norway means adhering to the Personal Data Act. If you store customer data (names, emails, payment history), you are a data controller. Having your DR site in a non-EU/EEA jurisdiction creates a legal headache regarding Safe Harbor frameworks. Keeping your primary and backup data within Norwegian borders simplifies compliance significantly.

Don't let a hardware failure become a business failure. Hard drives die. Power supplies pop. Humans type rm -rf /. The only defense is a recovery plan that is tested, local, and fast.

If you need a staging environment to test your DR scripts without impacting production, spin up a KVM instance on CoolVDS today. Our PCIe SSD storage ensures that even your heaviest backups complete before the cron job times out.