Disaster Recovery in 2019: Why Your "Backups" Will Fail When You Need Them Most
It is 3:00 AM in Oslo. Your phone buzzes. Itâs not a text from a friend; itâs PagerDuty. Your primary database node just vanished. Not a restart, not a hiccupâgone. Can you recover? If your answer relies on a nightly snapshot stored on the same SAN as your compute nodes, you are already dead in the water. I have seen seasoned systems administrators weep over lost data because they confused redundancy with recovery. RAID 10 saves you from a dead disk; it does not save you from a rogue rm -rf /, a corrupted inode table, or a facility-wide power failure. In the unforgiving landscape of systems administration, hope is not a strategy, and "uptime" is a lie we tell managers until we architect true resilience.
The hard truth about hosting in 2019 is that while hardware has become incredibly reliableâespecially with the widespread adoption of NVMe storageâsoftware complexity has exploded. We are stacking Docker containers on top of virtualization layers, orchestrating them with tools that are often more fragile than the applications they support. When disaster strikes, you don't want complex abstractions; you want raw, accessible data and a clear path to restoration. This guide assumes you are tired of theoretical whitepapers and want a concrete, battle-hardened plan involving asynchronous replication, off-site encrypted archives, and the legal safety of keeping your data within Norwegian borders under the watchful eye of Datatilsynet.
The 3-2-1 Rule is Non-Negotiable
Before we touch a single configuration file, we must establish the ground rules. The 3-2-1 backup methodology is the industry standard for a reason: keep 3 copies of your data, on 2 different media types, with 1 copy off-site. In the context of a high-performance VPS environment, "off-site" doesn't just mean a different folder; it means a different physical datacenter, preferably separated by significant geographic distance but low network latency. For a server hosted in Oslo, a secondary location in Stockholm or a distinct availability zone within Norway is acceptable, provided the failure domains are isolated. If your primary VPS gets hit by a DDoS attackâa common occurrence in the Nordic regionâyou need a cold standby that isn't null-routed by the same upstream provider.
Step 1: The Database Layer (MariaDB Replication)
Database restoration is the bottleneck of any recovery operation. Importing a 50GB SQL dump takes hours you don't have. The solution is Master-Slave replication. We aren't looking for synchronous clustering like Galera hereâthat introduces write latency that kills performance on high-traffic Magento or WordPress sites. We want standard asynchronous replication. If the Master dies, the Slave has the data, ready to be promoted.
On your Master server (e.g., CoolVDS NVMe Instance A), edit your /etc/my.cnf. We need to enable binary logging and set a unique server ID. This configuration is tuned for a balance of safety and speed on SSD/NVMe storage.
[mysqld]
server-id = 1
log_bin = /var/lib/mysql/mysql-bin
binlog_format = ROW
expire_logs_days = 7
max_binlog_size = 100M
# Durability settings for NVMe
innodb_flush_log_at_trx_commit = 1
innodb_flush_method = O_DIRECT
sync_binlog = 1
After restarting MariaDB, create a dedicated replication user. Do not use root. Security is a layer of disaster recovery; a compromised root account is a disaster in itself. Limit this user strictly to the IP address of your slave server to prevent unauthorized replication attempts.
GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'10.8.0.5' IDENTIFIED BY 'StrongPass_2019!'; FLUSH PRIVILEGES;
On the Slave server (CoolVDS Instance B), your configuration needs to be aware it is a subordinate. Note the read_only flagâthis prevents accidental writes to your backup, ensuring data integrity remains pristine until you deliberately promote the server.
[mysqld]
server-id = 2
relay_log = /var/lib/mysql/mysql-relay-bin
log_bin = /var/lib/mysql/mysql-bin
read_only = 1
Pro Tip: Use a VPN tunnel (like OpenVPN) between your Master and Slave for replication traffic. Never send raw SQL replication data over the public internet, even if you trust your ACLs.
File Synchronization: rsync is King
For file assetsâimages, configuration files, codebasesânothing beats rsync. It is efficient, ubiquitous, and reliable. However, a naive rsync cron job can propagate corruption. If a hacker encrypts your files with ransomware on the master, rsync will dutifully copy the encrypted garbage to your backup, destroying both copies. To mitigate this, we use --link-dest for incremental snapshots or tools like BorgBackup. For this tutorial, we will stick to a robust rsync script that uses hard links to create time-point recovery folders without wasting space.
Create a script /root/scripts/dr_sync.sh on your Backup server. This script pulls data from the production server. Pushing backups is risky; if the production server is compromised, the attacker can trash the backups. By pulling, the backup server controls the process.
#!/bin/bash
SOURCE_USER="root"
SOURCE_HOST="192.0.2.10"
SOURCE_DIR="/var/www/html/"
DEST_BASE="/backup/snapshots"
DATE=$(date +%Y-%m-%d_%H-%M-%S)
LATEST_LINK="$DEST_BASE/latest"
# Ensure directory exists
mkdir -p $DEST_BASE
# Run rsync with hard-linking against the previous backup
rsync -avz --delete \
--link-dest="$LATEST_LINK" \
-e "ssh -i /root/.ssh/id_rsa_backup" \
"$SOURCE_USER@$SOURCE_HOST:$SOURCE_DIR" \
"$DEST_BASE/$DATE"
# Update the 'latest' symlink
rm -f "$LATEST_LINK"
ln -s "$DEST_BASE/$DATE" "$LATEST_LINK"
# Clean up snapshots older than 14 days
find "$DEST_BASE" -maxdepth 1 -type d -mtime +14 -exec rm -rf {} \;
Make it executable:
chmod +x /root/scripts/dr_sync.sh
The Infrastructure Factor: Why "Where" Matters
You can have the best scripts in the world, but if your provider's network is saturated or their power redundancy fails, you are offline. In Norway, we benefit from exceptional hydroelectric stability, but network latency is physics. When choosing a VPS provider for your DR site, you need low latency to your primary audience. If your customers are in Oslo, replicating to a server in Frankfurt adds milliseconds that delay your Recovery Point Objective (RPO).
This is where CoolVDS distinguishes itself from generic cloud giants. We don't oversell our CPU cycles. When you are performing a database recovery, you need sustained high IOPS. Shared hosting environments often throttle you exactly when you need speed the most. CoolVDS guarantees KVM-based isolation with direct NVMe pass-through, meaning your recovery speed is limited only by the bus speed, not by a "noisy neighbor" mining cryptocurrency next door.
Securing the Network
During a disaster, you might be tempted to drop firewalls to "just get it working." Don't. A server under recovery is vulnerable. Use iptables (or firewalld on CentOS 7) to ensure only your office IP and the internal replication IPs can talk to the critical services during the restoration phase.
iptables -A INPUT -p tcp -s 10.8.0.5 --dport 3306 -j ACCEPT
This simple rule ensures that while you are fixing the database, the public cannot connect and write inconsistent data before you are ready.
Automating the Failover
Manual failover is stressful. In 2019, tools like Ansible allow us to define our recovery state as code. Instead of remembering 50 commands at 4 AM, you run one playbook. Here is a snippet of an Ansible task that promotes a Slave database to Master, stopping the read-only mode and updating the application config to point to the new localhost.
- name: Stop Slave Replication
mysql_replication:
mode: stopslave
- name: Reset Master Status
command: mysql -e "RESET MASTER;"
- name: Disable Read-Only Mode
lineinfile:
path: /etc/my.cnf
regexp: '^read_only'
line: 'read_only = 0'
notify: restart_mariadb
- name: Update Web App Config
template:
src: templates/wp-config.php.j2
dest: /var/www/html/wp-config.php
owner: nginx
group: nginx
Compliance and Sovereignty
Finally, a word on compliance. With GDPR in full effect since last year, where your data lives is a legal matter. Datatilsynet is not lenient with data exported to jurisdictions with weak privacy laws. By keeping your primary and disaster recovery nodes within recognized secure zones (like Norway or the EEA), you mitigate legal risks. CoolVDS operates strictly under Norwegian and European law, ensuring your disaster recovery plan doesn't turn into a legal disaster.
Disaster recovery is not a product you buy; it is a discipline you practice. The scripts above are your foundation. The hardware is your fuel. Don't let slow I/O or unreliable neighbors kill your business when the inevitable happens. Build your fortress now.
Ready to build a failover cluster that actually works? Deploy a high-performance NVMe instance on CoolVDS today and get 1Gbps connectivity to NIX instantly.