Console Login

Zero-Downtime Database Migration: Tactics for High-Traffic Norwegian Workloads

Zero-Downtime Database Migration: Tactics for High-Traffic Norwegian Workloads

Moving a production database is like performing open-heart surgery while the patient is running a marathon. One wrong command, one miscalculated buffer pool setting, and you aren't just looking at downtime—you are looking at data corruption, angry stakeholders, and a potential inquiry from Datatilsynet.

I have spent the last decade migrating data across the Nordic region, from legacy metal in kjellerstuer to modern virtualized infrastructure. The reality is simple: latency kills consistency. When you are syncing terabytes of transactional data between a legacy host and a new provider, physics is your enemy.

This guide ignores the "maintenance mode" approach. If you can afford to shut down your business for 8 hours to run mysqldump, you don't need this article. This is for the systems that cannot stop.

The Norwegian Context: Latency and Legality

Before touching a config file, look at your topology. In 2019, data sovereignty is no longer optional. With GDPR fully enforceable, ensuring your data remains within EAA (European Economic Area) boundaries—and preferably within Norway for local businesses—is critical.

Furthermore, latency to the Norwegian Internet Exchange (NIX) in Oslo dictates your replication lag. If you are migrating a high-write database from a server in Germany to a VPS in Norway, the Round Trip Time (RTT) will directly impact how fast your slave can acknowledge writes if you are using semi-synchronous replication.

The Strategy: Replication-Based Cutover

The only viable strategy for near-zero downtime is setting up a Master-Slave replication topology where the "Slave" is your new destination server. Once the slave catches up, you promote it to Master.

1. The Infrastructure Requirement

Here is where most migrations fail. The destination server (the Slave) must have higher I/O capacity than the Master during the catch-up phase. The Slave has to process the initial data load and apply the backlog of binary logs that accumulated during the transfer.

Pro Tip: Do not attempt this on standard spinning rust or cheap SATA SSDs shared with 50 other tenants. The IO_THREAD will bottleneck, and Seconds_Behind_Master will never hit zero. This is why we utilize KVM instances with direct-attached NVMe storage at CoolVDS. You need the IOPS headroom to process the backlog faster than the Master generates it.

2. MySQL/MariaDB Configuration

Let's assume you are running MySQL 5.7 or 8.0 (GA). First, ensure your Master is configured to write binary logs.

Master Configuration (my.cnf):

[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW
expire_logs_days = 7
max_binlog_size = 100M
# Essential for data integrity on crash
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1

Create the replication user. Restrict this strictly to the IP of your new CoolVDS instance (e.g., 10.0.0.5) to minimize the attack surface.

CREATE USER 'repl_user'@'10.0.0.5' IDENTIFIED BY 'StrongPassw0rd!';
GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'10.0.0.5';
FLUSH PRIVILEGES;

3. The Initial Sync (Non-Blocking)

Instead of locking tables, use Percona XtraBackup. It allows you to take a hot backup without locking your InnoDB tables.

# On the Source (Master)
xtrabackup --backup --target-dir=/data/backups/ --user=root --password=YOURPASS

# Prepare the backup (apply transaction logs)
xtrabackup --prepare --target-dir=/data/backups/

Transfer this to your new server using rsync. This is often the longest step. Using a provider with unmetered internal bandwidth or peering helps here.

4. Configuring the Destination (CoolVDS)

On the new server, you need to optimize for write-heavy catch-up. Temporarily relax durability settings only during the catch-up phase to speed up the process.

[mysqld]
server-id = 2
# Optimize for NVMe
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000
innodb_flush_method = O_DIRECT
# Dedicated memory (assume 16GB RAM instance)
innodb_buffer_pool_size = 10G

Once you restore the data, point the Slave to the Master using the coordinates found in the xtrabackup_binlog_info file.

CHANGE MASTER TO
  MASTER_HOST='192.168.1.100',
  MASTER_USER='repl_user',
  MASTER_PASSWORD='StrongPassw0rd!',
  MASTER_LOG_FILE='mysql-bin.000005',
  MASTER_LOG_POS=45678;

START SLAVE;

The "Catch-Up" Phase and Hardware Reality

Monitor the status with SHOW SLAVE STATUS\G. Watch the Seconds_Behind_Master metric.

If this number increases, your new disk cannot write fast enough. I recently debugged a failed migration for a client moving to a "budget" cloud provider. Their 4-core instance had plenty of CPU, but the storage backend was throttled at 300 IOPS. The database never caught up. We moved the workload to a CoolVDS NVMe instance, and the replication lag dropped from 4 hours to 0 seconds in under 20 minutes. Raw I/O throughput is the only variable you cannot optimize via software code.

PostgreSQL Considerations

For PostgreSQL 10 or 11 users, the logic is similar but the tools differ. You will rely on pg_basebackup for the initial stream and Write Ahead Logs (WAL) for replication.

pg_hba.conf (Master):

host    replication     replicator      10.0.0.5/32            md5

Recovery Configuration (recovery.conf on Slave):

standby_mode = 'on'
primary_conninfo = 'host=192.168.1.100 port=5432 user=replicator password=StrongPassw0rd!'
trigger_file = '/tmp/postgresql.trigger.5432'

The Cutover (The Scary Part)

Once lag is zero, execute the switch:

  1. Stop Writes: Set the application to read-only or block port 3306/5432 on the old Master firewall.
  2. Verify Consistency: Ensure the Slave has executed the final transactions.
  3. Promote Slave:
    • MySQL: STOP SLAVE; RESET MASTER;
    • Postgres: touch /tmp/postgresql.trigger.5432
  4. Repoint App: Update your DNS or connection strings to point to the new CoolVDS IP.

Conclusion

Migrations are about risk management. You mitigate risk with superior preparation and superior hardware. Don't let your infrastructure be the reason you're awake at 4 AM fixing corrupted tables. Use rsync with checksums, test your backups, and ensure your target environment has the NVMe performance required to handle the surge.

Ready to validate your migration plan? Deploy a high-performance NVMe instance on CoolVDS today and benchmark your replication speeds against the competition.