Console Login

Disaster Recovery in a Post-Schrems II World: A Norwegian CTO’s Playbook

The Day the Datacenter Burned Down

March 10, 2021. A date etched in the memory of every Systems Architect in Europe. When the SBG2 datacenter in Strasbourg caught fire, it didn't just melt servers; it incinerated the "cloud is magic" delusion. Thousands of companies realized that their "offsite backups" were actually just in the next room, burning alongside their production environment.

If you are hosting in Norway, you have a second headache: Schrems II. The July 2020 ruling effectively killed the Privacy Shield. Moving personal data to US-owned clouds for disaster recovery (DR) is now a legal minefield under Datatilsynet's watchful eye. If you are a CTO or Lead DevOps in 2021, your mandate is clear: Keep the data intact, keep it online, and keep it sovereign.

This isn't a high-level policy document. This is an implementation guide for building a resilient, compliant DR strategy using tools available right now on standard Linux infrastructure.

The Architecture: Warm Standby via WireGuard

Cold storage (tape, Glacier) is fine for archives, but in 2021, an RTO (Recovery Time Objective) of 48 hours is unacceptable for e-commerce or SaaS. We need a Warm Standby.

We will assume a primary CoolVDS instance in Oslo and a secondary DR node. To ensure security without the overhead of IPsec, we use WireGuard (mainlined in Linux 5.6). It is faster than OpenVPN and perfect for linking NVMe-based instances where I/O throughput matters.

1. Establishing the Secure Tunnel

First, we create a private, encrypted network between your Production server and your DR server. Public internet replication is reckless.

Install WireGuard on both Ubuntu 20.04 instances:

apt install wireguard

Generate keys on both servers:

wg genkey | tee privatekey | wg pubkey > publickey

Here is the configuration for the DR Node (Receiver). This setup ensures that only traffic on the specific replication subnet is allowed.

# /etc/wireguard/wg0.conf on DR Node
[Interface]
Address = 10.8.0.2/24
SaveConfig = true
ListenPort = 51820
PrivateKey = <DR_NODE_PRIVATE_KEY>

# Optimize MTU for Jumbo Frames if your provider supports it, otherwise standard 1420
MTU = 1420

[Peer]
PublicKey = <PROD_NODE_PUBLIC_KEY>
AllowedIPs = 10.8.0.1/32
Endpoint = prod.coolvds-host.no:51820

Start the interface:

wg-quick up wg0
Pro Tip: On CoolVDS NVMe instances, the CPU overhead for WireGuard encryption is negligible, but ensure you tune your UDP buffers. Run sysctl -w net.core.rmem_max=4194304 to prevent packet drops during heavy replication bursts.

2. Database Replication: Async with GTID

For the database, we use MariaDB 10.5. We rely on asynchronous replication with Global Transaction IDs (GTID). Why not synchronous Galera? Because for a DR site that might be geographically separated (to avoid the same power grid failure), latency can kill your production write speeds. We want the DR site to catch up, not hold us back.

On the Production (Master) server, configure my.cnf to bind to the WireGuard interface:

# /etc/mysql/mariadb.conf.d/50-server.cnf
[mysqld]
bind-address            = 10.8.0.1
server-id               = 1
log_bin                 = /var/log/mysql/mysql-bin.log
expire_logs_days        = 7
max_binlog_size         = 100M
binlog_format           = ROW
gtid_strict_mode        = ON
innodb_flush_log_at_trx_commit = 1 
# essential for ACID compliance, don't set to 0 or 2 unless you like data loss

Create the replication user restricted to the VPN subnet:

CREATE USER 'repl_user'@'10.8.0.%' IDENTIFIED BY 'StrongPass2021!'; GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'10.8.0.%';

On the DR (Slave), point it to the master:

CHANGE MASTER TO 
MASTER_HOST='10.8.0.1',
MASTER_USER='repl_user', 
MASTER_PASSWORD='StrongPass2021!', 
MASTER_USE_GTID=slave_pos; 
START SLAVE;

3. File Asset Synchronization

Databases are half the battle. What about user uploads? If you are running a Magento store or a Django app, you have media files. rsync is the old reliable tool here, but we run it over our WireGuard tunnel to avoid SSH overhead and expose nothing to the public internet.

Here is a robust script you can put in /etc/cron.d/dr-sync. It uses a lockfile to prevent overlapping executions if a sync takes too long.

#!/bin/bash
# /usr/local/bin/dr_sync.sh

LOCKFILE="/var/run/dr_sync.lock"
SRC="/var/www/html/storage/"
DEST="rsync_user@10.8.0.2:/var/www/html/storage/"

if [ -e ${LOCKFILE} ] && kill -0 `cat ${LOCKFILE}`; then
    echo "Sync already running"
    exit
fi

trap "rm -f ${LOCKFILE}; exit" INT TERM EXIT
echo $$ > ${LOCKFILE}

# -a: archive mode
# -z: compress (useful if bandwidth is tight, skip if both on high-speed CoolVDS)
# --delete: remove files on DR that were deleted on Prod
rsync -az --delete -e "ssh -i /root/.ssh/dr_key" $SRC $DEST

The "CoolVDS" Factor in Disaster Recovery

Infrastructure choice dictates recovery speed. You can have the best scripts in the world, but if your DR provider has "noisy neighbors" stealing I/O operations (IOPS), your recovery time will balloon. During a restore, you are writing massive amounts of data sequentially.

We see this constantly with budget VPS providers in the Nordics. They oversell spinning rust (HDD) as "cloud storage." When you try to replay 50GB of binary logs, the disk queue depth spikes, and your restore stalls.

This is where NVMe becomes a business continuity requirement, not a luxury. CoolVDS instances use pure NVMe storage. When replaying database transactions or syncing millions of small files via rsync, the random I/O performance of NVMe reduces RTO from hours to minutes.

Testing: The "Scream Test" Simulation

A disaster recovery plan that hasn't been tested is just a hope and a prayer. You must schedule a drill.

  1. Block traffic to the production node (simulate a network outage).
  2. Promote the DR DB: Run STOP SLAVE; RESET MASTER; on the DR node.
  3. Switch DNS: Update your A records (TTL should be set to 60s beforehand).

If you are hosting critical services for Norwegian clients, keep your data inside the kingdom. Using a CoolVDS node in Oslo as your primary and a second node (perhaps in a different availability zone or a partner datacenter) ensures you comply with GDPR while keeping latency low for local users.

Final check: Check your backups. Check them again. Then, check the checksums.

Next Step: Do not wait for the next datacenter fire or fiber cut. Spin up a secondary NVMe instance on CoolVDS today, configure WireGuard, and sleep better tonight knowing your data is safe, sovereign, and sync’d.