Zero-Downtime Database Migration: A DevOps Guide to Surviving Data Transfers in 2022
The most terrifying progress bar in a sysadmin's life isn't a firmware update; it's a database restore on a production environment. I once watched a 500GB PostgreSQL restore crawl at 2MB/s because the hosting provider decided to throttle disk I/O on a "standard" VPS. The migration window was four hours. The restore took twelve. The client was furious.
That experience taught me a brutal lesson: infrastructure dictates strategy. In 2022, if you are still doing cold "dump and restore" migrations for anything larger than a WordPress blog, you are doing it wrong. With the current maturity of logical replication in PostgreSQL 14 and GTIDs in MySQL 8.0, there is no excuse for extended downtime.
This guide covers how to execute a zero-downtime migration, specifically tailored for the Nordic market where latency to NIX (Norwegian Internet Exchange) and compliance with Datatilsynet (The Norwegian Data Protection Authority) are as critical as the IOPS count.
The Strategy: Replication, Not Restoration
The "dump, transfer, restore" method is dead for critical systems. The time it takes to restore grows linearly with data size, while your maintenance window stays fixed. The only professional path forward is Master-Slave Replication.
The concept is simple: you create a replica on the new server (CoolVDS), let it catch up to the old server in real-time, and then simply flip the switch. The downtime is reduced to the seconds it takes to update your application's connection string or swap a floating IP.
Phase 1: The Hardware Foundation
Before touching a config file, look at the destination hardware. Database ingestion is I/O heavy. During the initial sync, your disk will be writing as fast as the network allows.
At CoolVDS, we enforce the use of local NVMe storage for this exact reason. Network-attached storage (NAS/SAN), common in budget cloud providers, adds latency that kills replication lag recovery. If your write latency exceeds 1-2ms, your replica might never catch up to a busy master.
Pro Tip: Before starting, benchmark your disk I/O. Use fio to ensure you are getting the throughput you paid for.
# A quick sanity check for random write performance
fio --name=db_test --ioengine=libaio --rw=randwrite --bs=4k --numjobs=4 --size=4G --runtime=60 --time_based --group_reporting
Phase 2: Secure Tunneling (The GDPR Layer)
Transferring unencrypted database traffic across the public internet is a violation of GDPR and a security suicide mission. Since Schrems II, moving data across borders requires strict scrutiny, but even moving data from Oslo to another data center in Norway requires encryption.
While you can configure SSL within MySQL/Postgres, I prefer wrapping the entire connection in a WireGuard tunnel or an SSH tunnel. It's cleaner and easier to debug.
Setting up a robust SSH tunnel:
# On the Destination Server (CoolVDS)
# -L 3307:localhost:3306 maps local port 3307 to the Source's 3306
ssh -N -f -L 3307:127.0.0.1:3306 user@source-server-ip -i ~/.ssh/id_rsa_migration
Now, your local application on the destination server talks to 127.0.0.1:3307, and traffic flows encrypted to the source.
Phase 3: Configuring the Engine
MySQL / MariaDB Optimization
For MySQL, we rely on GTID (Global Transaction ID) based replication. It makes failover and tracking much more robust than the old binlog file/position method.
On your Source server, ensure my.cnf has:
[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
gtid_mode = ON
enforce_gtid_consistency = ON
binlog_format = ROW
On the Destination (CoolVDS NVMe instance), you need to tune for writes. During the initial dump import, you want to relax durability slightly to speed up the process. Just remember to revert these after the migration!
[mysqld]
server-id = 2
# Speed up import significantly by delaying disk sync
innodb_flush_log_at_trx_commit = 0
sync_binlog = 0
innodb_buffer_pool_size = 6G # Assuming an 8GB VPS instance
PostgreSQL optimization
For PostgreSQL, pg_dump piped directly to psql is often faster than writing to disk first, provided your network between the source and CoolVDS is stable (latency within Norway is usually <10ms, which is ideal).
# Streaming migration via pipe (run from destination)
ssh user@source_ip "pg_dump -U postgres -Fc dbname" | pg_restore -U postgres -d dbname -j 4
The -j 4 flag runs the restore in parallel using 4 cores. This is where CoolVDS's dedicated CPU allocation shines—if you use a "burstable" vCPU from a budget host, you will hit CPU steal limits instantly, and the restore will stall.
Phase 4: The Switchover
Once replication is running and the Seconds_Behind_Master is 0 (or close to it), you are ready.
- Lower TTL: reduce DNS TTL to 60 seconds a day before migration.
- Maintenance Mode: Put the app in read-only mode or show a maintenance page.
- Lock Source:
FLUSH TABLES WITH READ LOCK;on the source DB. - Wait for Sync: Ensure the destination has processed the final transactions.
- Promote Destination: Stop the slave thread and make the new DB writable.
- Switch Traffic: Update DNS or load balancer config.
With this method, the actual "lock" time where data cannot be written is often less than 30 seconds.
Why Infrastructure Matters: The Local Advantage
You can have the best DBA scripts in the world, but they will fail on poor infrastructure. High latency causes replication lag. Noisy neighbors cause CPU stalls during checksum verification. Cheap disks cause I/O bottlenecks.
| Feature | Budget Cloud | CoolVDS (Norway) |
|---|---|---|
| Storage | Shared SATA/SSD (Network) | Local NVMe |
| Virtualization | Container (LXC/OpenVZ) | KVM (Full Kernel Isolation) |
| Data Sovereignty | Often routed via Frankfurt/US | Strictly Norway (Oslo) |
When dealing with Datatilsynet regulations, knowing exactly where your data physically sits is paramount. CoolVDS ensures your data doesn't accidentally route through a non-compliant node.
Final Thoughts
Database migration is surgery. You don't perform surgery in a dirty room with dull tools. You need a sterile environment (KVM isolation) and sharp instruments (NVMe storage). By using replication strategies and secure tunneling, you mitigate 99% of the risk associated with data transfers.
Ready to test your replication throughput? Spin up a CoolVDS instance in Oslo in under 55 seconds and see the difference local NVMe makes for your database queries.