Console Login

Surviving the Meltdown: A Battle-Tested Disaster Recovery Strategy for Norwegian Infrastructure

Surviving the Meltdown: A Battle-Tested Disaster Recovery Strategy for Norwegian Infrastructure

I once watched a grown man—a CTO of a mid-sized logistics firm in Drammen—weep into his coffee because a single disk controller failure on a Friday afternoon wiped out three years of transactional data. It wasn't the hardware failure that broke him; hardware fails, that is physics. It was the realization that his "backup strategy" was a cron job that hadn't run successfully since 2012 and a RAID 5 array that took two days to rebuild, only to fail again during the resilvering process. If you are running mission-critical workloads in 2014 without a hot-standby Disaster Recovery (DR) plan, you are not an engineer; you are a gambler. With the entire industry still reeling from the OpenSSL Heartbleed vulnerability last month, the fragility of our systems has never been more exposed. Security patches are one thing, but structural integrity is another beast entirely. We need to talk about latency, replication lag, and the specific legal requirements here in Norway that make hosting data outside our borders a legal minefield under the scrutiny of Datatilsynet.

The Anatomy of a Failover: Beyond Simple Backups

A backup is a snapshot of the past; disaster recovery is the assurance of a future. In my years managing high-traffic Linux clusters, I have learned that the Mean Time To Recovery (MTTR) is the only metric that matters when your primary data center goes dark. For a robust setup, specifically for clients targeting the Norwegian market, we need to leverage the low latency of the Norwegian Internet Exchange (NIX) to keep replication lag practically non-existent. The architecture I advocate for involves a primary KVM-based node acting as the Master, with a geographically separated Slave node ready to take over the Virtual IP (VIP) at a moment's notice. We choose KVM (Kernel-based Virtual Machine) over OpenVZ because, in a disaster scenario, you need guaranteed kernel resources, not burstable promises that vanish when the host node gets busy.

Pro Tip: Never rely on shared storage (SAN/NAS) for your primary database disaster recovery. If the file system corrupts, it replicates that corruption instantly. Use application-level replication for databases and block-level or file-level syncing for assets.

Real-Time Asset Synchronization with Lsyncd

For static assets—images, user uploads, configuration files—`rsync` on a cron job is insufficient. You need real-time mirroring. In 2014, the most robust tool for this is lsyncd (Live Syncing Daemon), which watches the local directory trees for changes via inotify and spawns rsync to synchronize them to the remote disaster recovery site. It is lightweight, efficient, and doesn't eat your CPU like some Java-based enterprise bloatware. Here is the production configuration I deployed last week for a media client running on Ubuntu 14.04 LTS.

settings {
    logfile = "/var/log/lsyncd/lsyncd.log",
    statusFile = "/var/log/lsyncd/lsyncd.status",
    statusInterval = 10
}

sync {
    default.rsync,
    source = "/var/www/html/uploads/",
    target = "dr-user@192.168.10.55:/var/www/html/uploads/",
    delay = 1,
    rsync = {
        compress = true,
        archive = true,
        verbose = true,
        rsh = "/usr/bin/ssh -p 22 -o StrictHostKeyChecking=no"
    }
}

Database Replication: The Heart of the System

Your database is the single point of truth. If you lose it, you don't have a business. For MySQL 5.6, we moved away from the old asynchronous replication which risked data loss during a crash, towards semi-synchronous replication, but for pure performance on a Master-Slave setup across a WAN, standard asynchronous replication with careful tuning is often the pragmatic choice to avoid locking the master. The key is ensuring your binary logs are written to disk immediately. Many VPS providers oversell their I/O, leading to `iowait` spikes that kill replication. This is where the underlying storage matters. CoolVDS uses high-performance SSD RAID arrays (and we are testing early PCIe/NVMe storage hardware) which allows us to set `sync_binlog = 1` without bringing the server to a crawl. On a standard spinning rust VPS, forcing a disk sync on every transaction would destroy your throughput.

Optimizing my.cnf for Durability

To ensure that your slave server can catch up and that your master doesn't lose transactions if the power gets cut, you must configure InnoDB correctly. This configuration is non-negotiable for financial or personal data under the Personal Data Act (Personopplysningsloven).

[mysqld]
# Unique ID for the replication topology
server-id               = 1
log_bin                 = /var/log/mysql/mysql-bin.log
binlog_format           = ROW
expire_logs_days        = 10
max_binlog_size         = 100M

# Durability Settings
# 1 = flush to disk at every commit. Safest, but needs fast I/O (SSD).
innodb_flush_log_at_trx_commit = 1

# 1 = sync binary log to disk at every commit.
sync_binlog = 1

# Networking
bind-address            = 0.0.0.0
# Skip name resolve to avoid DNS latency issues
skip-name-resolve

The Failover Script

When the alarm goes off at 3 AM, you do not want to be typing commands manually. You need a script that promotes the slave to master, updates your DNS (or switches the Heartbeat IP), and stops the replication process so the new master accepts writes. While tools like `MHA` (Master High Availability) are gaining traction, a battle-hardened sysadmin knows how to do this in Bash for absolute control. Below is a snippet of a promotion logic we use for emergency manual failovers.

#!/bin/bash
# promote_slave.sh - Emergency promotion of Slave to Master

MYSQL_USER="root"
MYSQL_PASS="ComplexPass2014!"

echo "Stopping Slave..."
mysql -u$MYSQL_USER -p$MYSQL_PASS -e "STOP SLAVE;"

echo "Resetting Master status to ensure clean slate..."
mysql -u$MYSQL_USER -p$MYSQL_PASS -e "RESET MASTER;"

echo "Configuring as Read-Write..."
# Assuming read_only was set to 1 in my.cnf for the slave
mysql -u$MYSQL_USER -p$MYSQL_PASS -e "SET GLOBAL read_only = 0;"

echo "Slave promoted. Please update application config DB_HOST IP immediately."
logger "FAILOVER: Slave promoted to Master at $(date)"

Legal Compliance and Infrastructure Choice

We cannot ignore the legal reality of operating in Norway. The Datatilsynet is strict about where personal data resides and how it is protected. Hosting your DR site on a budget provider in the US might save you a few kroner, but it exposes you to the Patriot Act and potential violations of Norwegian privacy laws. Keeping your primary and secondary data within Norwegian borders—or at least within the EEA with strict Safe Harbor adherence—is critical. This is why CoolVDS invests heavily in local infrastructure. Our servers are located in secure facilities with redundant power and cooling, connected directly to major fiber rings. When we talk about low latency, we mean single-digit milliseconds to Oslo. This physical proximity allows for near-synchronous replication without the performance penalty you'd see replicating to a server in Frankfurt or Amsterdam.

Why