The Server is Down, and the Clock is Ticking
It is 3:00 AM on a Tuesday. Your monitoring system just screamed at you. Your primary database node isn't responding to ping. The client—a major Norwegian e-commerce retailer—is losing money by the second. To make matters worse, May 25, 2018, is staring us in the face. With the GDPR enforcement date just two months away, losing data isn't just an operational failure anymore; it's a potential legal catastrophe involving Datatilsynet.
Too many systems administrators confuse "backups" with "disaster recovery" (DR). A backup is a copy of your data. Disaster recovery is the strategy, architecture, and workflow required to restore that data into a production-ready state. If you have a backup on a slow SATA drive and it takes 14 hours to restore, you don't have a DR plan. You have a resume-generating event.
In this guide, we are going to look at how to architect a resilient infrastructure using tools available right now in 2018, focusing on low-latency replication, data sovereignty in Norway, and the raw performance of KVM virtualization.
The Holy Trinity: RTO, RPO, and Latency
Before we touch a single config file, define your metrics. If you cannot answer these two questions, stop deploying.
- RPO (Recovery Point Objective): How much data are you willing to lose? (e.g., "The last 5 minutes of transactions").
- RTO (Recovery Time Objective): How long can you be down? (e.g., "We must be live in 15 minutes").
Achieving near-zero RTO/RPO requires redundant infrastructure. This is where the underlying hardware matters. We build CoolVDS on NVMe storage because restoring a 50GB MySQL dump on rotating rust (HDD) takes hours. On NVMe, it takes minutes. IOPS are the bottleneck of recovery.
Phase 1: Database Replication (MySQL 5.7)
For a standard LEMP stack (Linux, Nginx, MySQL, PHP), the database is the hardest part to recover. In 2018, Master-Slave replication is still the battle-tested standard for most setups ensuring high availability. We aren't messing with experimental clusters here; we want stability.
On your Master server (e.g., CentOS 7), edit /etc/my.cnf:
[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = row
expire_logs_days = 10
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1
Pro Tip: Setting sync_binlog = 1 and innodb_flush_log_at_trx_commit = 1 ensures ACID compliance. You lose a tiny bit of write speed, but you guarantee that committed transactions are actually on the disk. On CoolVDS instances, the NVMe backend negates the performance penalty usually associated with these flags.
On the Slave (Disaster Recovery node):
[mysqld]
server-id = 2
relay_log = /var/log/mysql/mysql-relay-bin.log
read_only = 1
The read_only = 1 flag is critical. It prevents your application from accidentally writing to the backup node while the master is still alive, which would cause a split-brain scenario—a nightmare you do not want to debug.
Phase 2: File Synchronization
Databases are only half the battle. What about user uploads, configuration files, and SSL certificates? While distributed filesystems like GlusterFS exist, they add complexity and overhead. For a robust, battle-hardened solution, lsyncd (Live Syncing Daemon) combined with rsync offers near real-time mirroring without the complexity.
Install lsyncd on your Master node:
yum install epel-release
yum install lsyncd
Configure /etc/lsyncd.conf to watch your web root and sync changes to the DR server immediately:
settings {
logfile = "/var/log/lsyncd/lsyncd.log",
statusFile = "/var/log/lsyncd/lsyncd.status",
nodaemon = false,
}
sync {
default.rsync,
source = "/var/www/html",
target = "dr-user@10.0.0.5:/var/www/html",
rsync = {
compress = true,
archive = true,
verbose = true,
rsh = "/usr/bin/ssh -p 22 -i /root/.ssh/id_rsa"
}
}
Security Note: Ensure the SSH key used for synchronization is restricted. On the destination server, use thecommand="rsync..."restriction in theauthorized_keysfile to prevent full shell access if the master node is compromised.
Phase 3: The Infrastructure Layer
Software configuration means nothing if the virtualization layer betrays you. This is why we argue against OpenVZ for critical production workloads. In an OpenVZ container, a kernel panic on the host node takes down everyone. Furthermore, resource isolation is not guaranteed.
CoolVDS utilizes KVM (Kernel-based Virtual Machine). Each instance runs its own kernel. If a neighbor manages to crash their OS, your instance keeps humming. More importantly, KVM allows us to pass through CPU instructions and reserve RAM rigidly. When we say you have 4GB of RAM, you have 4GB of RAM—not "burst" RAM shared with 50 other clients.
Geographic Redundancy and GDPR
With GDPR arriving in May, data sovereignty is paramount. If you are hosting Norwegian user data, keeping it within the EEA (European Economic Area) simplifies your compliance posture significantly. Hosting outside the EU/EEA requires navigating the Privacy Shield framework, which is under heavy scrutiny.
We operate our infrastructure directly in Oslo. This offers two distinct advantages:
- Legal Compliance: Your data stays under Norwegian jurisdiction and GDPR protections.
- Latency: If your customer base is in Scandinavia, the round-trip time (RTT) to a server in Oslo is often under 5ms. Comparing this to a server in Frankfurt (25-30ms) or US East (100ms+), the difference in application snappiness is palpable.
The "Fire Drill" Test
A DR plan is theoretical until tested. Schedule a maintenance window. Shut down your Master node violently (echo b > /proc/sysrq-trigger simulates a kernel crash). Measure exactly how long it takes to:
- Detect the failure.
- Promote the MySQL Slave to Master (
STOP SLAVE; RESET MASTER;). - Update your DNS A records or switch your Floating IP.
If this process takes longer than your RTO allows, you need faster hardware or better automation.
Conclusion
Disaster recovery isn't about buying insurance; it's about engineering survival. The combination of solid Linux primitives like rsync and MySQL replication, running on top of uncompromising KVM infrastructure with NVMe storage, provides the resilience required for 2018's digital demands.
Don't wait for the hardware failure to find out your restore script has a syntax error. Build your redundancy on a platform designed for it.
Ready to harden your infrastructure? Deploy a KVM NVMe instance in our Oslo datacenter today and experience the stability of CoolVDS.