Surviving the Crash: A Norwegian Sysadmin’s Guide to DR
Let’s be honest for a second. If your disaster recovery plan consists solely of a nightly cron job running mysqldump to the same physical disk, you are strictly gambling, not engineering. I have seen production servers in Oslo literally smoke—power supply failures cascading into motherboard shorts—taking the local storage down with them. When the CEO asks why the shop is dark, "the RAID card failed" is not an acceptable answer.
We are writing this in July 2016. The landscape has shifted. The Safe Harbor agreement is dead. The EU-US Privacy Shield was adopted just days ago (July 12), and confusion reigns supreme regarding data sovereignty. For those of us managing infrastructure in Norway, the directive is clear: keep the data local, keep it redundant, and for the love of root, test your backups.
The "R" Words: RTO and RPO
Before we touch a single config file, define your metrics. If you don't know these, you can't architect a solution.
- RPO (Recovery Point Objective): How much data are you willing to lose? One hour? One transaction?
- RTO (Recovery Time Objective): How long can you be offline?
For most of our clients running high-traffic Magento or WordPress setups on CoolVDS, the target is RPO < 5 minutes and RTO < 1 hour. Achieving this requires moving beyond simple backups to active replication.
Database Replication: The First Line of Defense
In 2016, MySQL 5.7 is the gold standard, offering significant improvements over 5.5. We aren't talking about Galera clusters today (too complex for many mid-sized setups); we are talking about robust Master-Slave replication. This allows you to have a warm standby ready to take traffic if your primary node goes dark.
Here is a production-ready snippet for my.cnf (or my.ini) on your Master server. Note the innodb_flush_log_at_trx_commit setting—we set it to 1 for ACID compliance, but if you are desperate for I/O speed on non-SSD drives, you might be tempted to set it to 2. Don't. On CoolVDS NVMe instances, leave it at 1. The hardware can handle it.
[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW
expire_logs_days = 7
max_binlog_size = 100M
# Reliability settings
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1
innodb_buffer_pool_size = 4G # Adjust to 70% of your RAMOn the Slave (Recovery Node), your config needs to be explicitly read-only to prevent accidental writes during normal operation:
[mysqld]
server-id = 2
relay-log = /var/log/mysql/mysql-relay-bin.log
read_only = 1Setting up the replication user requires specific grants. Do not use root.
CREATE USER 'repl_user'@'10.8.%' IDENTIFIED BY 'StrongPassword2016!';
GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'10.8.%';
FLUSH PRIVILEGES;Pro Tip: Use a private network (VLAN) for replication traffic. If you are hosting on CoolVDS, our internal network handles this without eating into your public bandwidth bandwidth allocation, and it keeps the replication stream off the public internet.
File Synchronization: rsync over SSH
Databases are half the battle. What about your user uploads? If you are running a PHP application, you likely have an /uploads or /media directory. You need to sync this to your DR site.
Forget FTP. We use rsync. It’s differential, meaning it only sends changes. Here is the script we use for a 5-minute sync interval via cron. It uses strict SSH key authentication.
#!/bin/bash
# /usr/local/bin/sync_media.sh
SOURCE_DIR="/var/www/html/media/"
REMOTE_USER="backup_user"
REMOTE_HOST="10.8.0.5"
REMOTE_DIR="/var/www/html/media/"
LOG_FILE="/var/log/media_sync.log"
rsync -avz --delete -e "ssh -i /home/backup_user/.ssh/id_rsa -o StrictHostKeyChecking=no" \
$SOURCE_DIR $REMOTE_USER@$REMOTE_HOST:$REMOTE_DIR >> $LOG_FILE 2>&1
if [ $? -eq 0 ]; then
echo "[$(date)] Sync successful" >> $LOG_FILE
else
echo "[$(date)] Sync FAILED" >> $LOG_FILE
# Send alert (mail or nagios)
echo "Sync Failed" | mail -s "DR Alert" admin@example.no
fiThe Network Failover Switch
So your Master server dies. You have a Slave database and synced files on a secondary CoolVDS instance. How do you route traffic?
In 2016, automated IP failover (VRRP/Keepalived) is great if you are in the same datacenter, but for true Disaster Recovery, your secondary node should ideally be in a different availability zone. The simplest mechanism is DNS Failover with a short TTL (Time To Live).
Set your DNS TTL to 300 seconds (5 minutes). If the primary goes down, you update the A-record to point to the DR IP. It’s not instant (propagation takes time), but it is reliable.
If you are using Nginx as a load balancer in front of your app servers, you can configure a backup directive:
upstream backend_servers {
server 10.8.0.2:80 max_fails=3 fail_timeout=30s;
server 10.8.0.3:80 backup; # The DR Node
}Why Virtualization Type Matters: KVM vs. OpenVZ
This is where many providers cut corners. OpenVZ (container-based) is cheap and efficient, but it shares the host kernel. If the host kernel panics, everyone goes down. Furthermore, you cannot modify kernel modules needed for specific advanced firewalling or VPN configurations (like IPsec) required for secure off-site backups.
At CoolVDS, we standardize on KVM (Kernel-based Virtual Machine). KVM provides full hardware virtualization. Your OS is isolated. If a neighbor crashes their OS, yours keeps humming. For DR planning, KVM allows you to take block-level snapshots. This is critical for data integrity.
The Legal Aspect: Privacy Shield & Norway
With the Safe Harbor framework invalidated last year, and the new Privacy Shield just adopted, sending data to US-based clouds (AWS, Google) is legally complex for Norwegian entities dealing with sensitive user data. The Datatilsynet is watching closely.
Hosting within Norway isn't just about latency (though 2ms ping to NIX in Oslo is fantastic); it's about compliance. By keeping your primary and DR nodes within Norwegian jurisdiction—or strictly within the EEA—you sidestep the legal grey areas of trans-Atlantic data transfer.
The Final Word
Disaster Recovery is not a product you buy; it is a process you test. If you haven't restored your backups to a clean CoolVDS instance in the last 3 months, you don't have backups; you have encrypted garbage.
Don't wait for the hardware failure. Spin up a secondary KVM instance today, configure your replication, and sleep better knowing you can survive the meltdown.