Console Login
Home / Blog / Systems Architecture / Surviving the Slashdot Effect: Building Bulletproof HA Clusters with DRBD and Heartbeat
Systems Architecture 2 views

Surviving the Slashdot Effect: Building Bulletproof HA Clusters with DRBD and Heartbeat

@

Surviving the Slashdot Effect: Building Bulletproof HA Clusters with DRBD and Heartbeat

It’s 3:00 AM on a Saturday. Your pager goes off. The database server just kernel panicked, and the e-commerce site you manage is serving 502 Bad Gateway errors to customers across Oslo. If you’ve been in the trenches long enough, you know this feeling. It’s the sound of money burning.

In 2011, relying on a single dedicated server is a gamble you can't afford to take. Hardware fails. Power supplies pop. Hard drives degrade. To sleep soundly, you need High Availability (HA). You need a system that heals itself.

Today, we’re going to look at how to build a redundant, battle-ready cluster using open-source tools standard on Linux: DRBD for storage replication and Heartbeat for resource management. This isn't theoretical; this is the exact architecture we use to keep mission-critical applications running at CoolVDS.

The Architecture: Moving Beyond Hardware RAID

Most admins slap a RAID 10 array on a server and call it "safe." That protects you if a disk dies, but what if the motherboard fries? You're down until you physically swap the hardware. That is unacceptable for a serious business.

The solution is Network RAID. We use DRBD (Distributed Replicated Block Device). It mirrors a block device (like /dev/sdb) over the network to a second server in real-time. If Node A dies, Node B has an identical copy of the data, down to the last byte.

The Setup

We will use two CoolVDS VPS instances running CentOS 5.6. The goal is an Active/Passive MySQL cluster.

  • Node 1 (Master): Runs MySQL, writes to DRBD device.
  • Node 2 (Slave): Drbd runs in secondary mode. MySQL is stopped.
  • Heartbeat: Monitors health. If Node 1 vanishes, it promotes Node 2 and starts MySQL automatically.

The Configuration: Getting Your Hands Dirty

First, don't skimp on the network. DRBD requires low latency. If you are hosting users in Norway, your servers need to be in Norway. Hosting in the US introduces 100ms+ latency, which will kill your write performance in synchronous replication mode. This is why CoolVDS peers directly at NIX (Norwegian Internet Exchange) in Oslo—we keep that ping time negligible.

Here is a production-ready snippet for /etc/drbd.conf. Note the protocol C directive—this ensures synchronous writes for maximum data safety.

resource mysql-data {
  protocol C;
  handlers {
    pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh";
  }
  startup {
    wfc-timeout  15;
    degr-wfc-timeout 60;
  }
  disk {
    on-io-error   detach;
  }
  net {
    cram-hmac-alg sha1;
    shared-secret "CoolVDS_Secret_2011";
  }
  on node1.coolvds.local {
    device    /dev/drbd0;
    disk      /dev/sdb1;
    address   10.0.0.1:7788;
    meta-disk internal;
  }
  on node2.coolvds.local {
    device    /dev/drbd0;
    disk      /dev/sdb1;
    address   10.0.0.2:7788;
    meta-disk internal;
  }
}

Automating Failover with Heartbeat

DRBD handles the data, but Heartbeat handles the IP address. You want a "Floating IP" that users connect to. Heartbeat moves this IP from Node 1 to Node 2 instantly during a failure.

Inside /etc/ha.d/haresources:

node1.coolvds.local IPaddr::192.168.1.100/24/eth0 drbddisk::mysql-data mysqld

This single line tells the cluster: "Ideally run on Node 1. If active, bring up the VIP 192.168.1.100, mount the DRBD disk, and start the mysqld service." Simple. Robust.

The Bottleneck: IOPS and the SSD Revolution

Clustering adds overhead. Every write to the database must traverse the network. If your underlying storage is slow, your database locks up. This is where most generic hosts fail.

Standard SATA 7200 RPM drives push about 80-100 IOPS (Input/Output Operations Per Second). In a heavy write scenario, that is not enough. You will see iowait spike in top, and your site will crawl.

Pro Tip: Stop trying to tune innodb_flush_log_at_trx_commit to fix bad hardware. The real fix is faster disks. At CoolVDS, we are pioneering the use of SSD storage for VPS hosting. The difference is not 20%; it is 2000%. We are seeing random read speeds that make traditional SAS arrays look like floppy disks.

Compliance and the "Datatilsynet" Factor

Technical architecture doesn't exist in a vacuum. If you handle Norwegian citizen data, you are bound by the Personopplysningsloven (Personal Data Act). Storing data outside the EEA can be a legal minefield.

By keeping your cluster physically located in our Oslo datacenter, you satisfy the Data Inspectorate's requirements for data sovereignty. Plus, you get that sweet, low latency to your local users. It’s a win-win for the pragmatic CTO and the paranoid SysAdmin.

Don't Wait for the Crash

High availability used to require expensive SANs and enterprise licenses. With Linux, DRBD, and a solid platform like CoolVDS, you can build it yourself today.

Do you want to test this setup without buying hardware? Spin up two managed hosting instances on our new SSD platform. You can configure the private network and be replicating blocks in under 10 minutes.

Deploy your HA Cluster on CoolVDS now.

/// TAGS

/// RELATED POSTS

Latency is the Enemy: Why Your Norwegian Stack Needs a CDN Strategy

Stop forcing users to wait for 30 hops to Oslo. We break down how to pair a robust Norwegian VDS wit...

Read More →

Surviving the Slashdot Effect: Architecture for High-Concurrency Norwegian Web Apps

Is your Apache server choking on max_clients? We analyze the shift to Nginx, the necessity of SSDs o...

Read More →

Cloud Storage Strategies for 2010: Why Your SAN is Obsolete

As we approach 2010, the "Cloud" buzzword is shifting IT budgets. We analyze why moving from physica...

Read More →

Decoupling the Monolith: Building High-Performance SOA in Norway (2009 Edition)

Is your LAMP stack buckling under the Slashdot effect? Stop throwing RAM at a single server. We expl...

Read More →
← Back to All Posts