Console Login

Surviving the Great Migration: Zero-Downtime MySQL Strategies for High-Traffic Nodes

Surviving the Great Migration: Zero-Downtime MySQL Strategies for High-Traffic Nodes

It’s 03:00 CET. You are staring at a terminal window, watching a cursor blink. You decided to migrate a 200GB database using mysqldump across a WAN link because the client wanted to save money on bandwidth. Now the site is locked, the restore is estimating four more hours, and your latency just spiked because the hosting provider's "unmetered" network is choked.

If this sounds familiar, you are doing it wrong. In the world of high-availability systems, downtime is not a maintenance window; it is a failure.

I have spent the last decade debugging my.cnf files and fighting replication lag. Today, we are going to talk about how to move data without waking up your users, specifically focusing on the Norwegian infrastructure landscape where latency to NIX (Norwegian Internet Exchange) matters more than you think.

The "Dump and Pray" Method is Dead

For anything larger than a few gigabytes, logical backups (SQL dumps) are too slow for migration. They require rebuilding indexes on import, which eats CPU cycles and I/O like a hungry child. For a live migration, you need physical backups. In 2013, the only tool you should be trusting for this on InnoDB engines is Percona XtraBackup.

Unlike mysqldump, XtraBackup copies the actual data files while the server is running, without locking your tables for the duration of the copy. It watches the transaction log (Redo Log) and applies changes that happened during the copy process.

Step 1: The Hot Copy

On your source server (Master), install the percona-xtrabackup package. Do not try to tar your /var/lib/mysql directory while MySQL is running; you will end up with corrupted tablespaces.

# Create a hot backup stream and pipe it directly to the new server
# This avoids writing to local disk if you are I/O bound
innobackupex --stream=tar ./ | ssh user@new-server-ip "tar -xvf - -C /data/mysql_temp"

This command streams the raw data files over SSH. If you have a decent pipe, this maxes out your network throughput, not your disk seek time.

Replication: The Bridge to the New World

Once the data is on the destination server (CoolVDS instance), you don't just switch DNS. You set up the new server as a Slave. This allows it to catch up on the transactions that occurred while you were moving the data.

First, prepare the backup on the destination side to ensure the InnoDB logs are consistent:

innobackupex --apply-log /data/mysql_temp

Check the xtrabackup_binlog_info file. It contains the exact binary log coordinates you need to start replication.

cat /data/mysql_temp/xtrabackup_binlog_info
# Output example:
# mysql-bin.000001  456789

Now, configure the new server to follow the old one:

CHANGE MASTER TO
  MASTER_HOST='192.168.1.10',
  MASTER_USER='repl_user',
  MASTER_PASSWORD='SecurePassword123',
  MASTER_LOG_FILE='mysql-bin.000001',
  MASTER_LOG_POS=456789;

START SLAVE;
Pro Tip: Always set read_only = 1 on your new server until the exact moment of cutover. I have seen too many developers accidentally write to the slave, breaking replication and forcing a complete restart of the process.

The I/O Bottleneck: Why Hardware Matters

You can have the most optimized queries in the world, but if your disk subsystem cannot keep up with IOPS (Input/Output Operations Per Second), your database will crawl. This is where most generic VPS providers fail. They put fifty customers on a single SATA RAID array. When one neighbor runs a backup, your database latency spikes.

In my.cnf, we often tune innodb_io_capacity to match the hardware. On a standard 7200RPM drive, you might set this to 100. On the enterprise SSD arrays we use at CoolVDS, you can crank this up significantly.

[mysqld]
# Optimize for high-performance storage
innodb_buffer_pool_size = 4G  # Set to 70-80% of available RAM
innodb_flush_log_at_trx_commit = 2  # 1 for ACID compliance, 2 for speed (risk of 1s data loss)
innodb_io_capacity = 2000 # Only for SSD/Flash storage
innodb_read_io_threads = 8
innodb_write_io_threads = 8

We see a lot of migrations failing not because of bad data, but because the destination server chokes on the catch-up phase. If the slave cannot execute SQL threads faster than the master generates them, it will never catch up. This is why low latency disk access is non-negotiable.

Data Sovereignty and Latency

For those of us operating in Norway, physical location is a dual-edged sword. First, there is the physics. If your users are in Oslo and your database is in a data center in Texas, you are fighting the speed of light. Every TCP handshake adds 150ms of lag. Hosting locally on a VPS Norway reduces that round-trip time to under 10ms.

Second, there is the Personal Data Act (Personopplysningsloven). With the increasing scrutiny from Datatilsynet regarding where data is stored and processed, keeping your customer data within Norwegian borders is the safest play for compliance. Unlike US-based clouds subject to the Patriot Act, local hosting offers a layer of legal insulation.

The Final Cutover

When your Seconds_Behind_Master hits 0, you are ready.

  1. Stop the application (or switch to maintenance mode).
  2. Verify the slave is fully caught up.
  3. Promote the slave: STOP SLAVE; RESET MASTER;
  4. Point your application config to the new CoolVDS IP.
Feature Traditional Hosting CoolVDS Architecture
Virtualization OpenVZ (Shared Kernel) KVM (Full Isolation)
Storage SATA HDD (Shared IOPS) Enterprise SSD/PCIe Flash
Network Best Effort Direct Peering to NIX

We built the CoolVDS platform because we were tired of debugging "noisy neighbor" issues on oversold nodes. By using KVM virtualization, we ensure that your RAM and CPU are actually yours. And by utilizing high-performance SSD storage, we eliminate the I/O wait times that kill database performance.

Don't let slow I/O kill your application's responsiveness. If you are planning a migration, you need a target that can handle the load. Deploy a test instance on CoolVDS today and benchmark the difference yourself.