Console Login

Surviving the Digg Effect: High-Availability Load Balancing with HAProxy on CentOS

Stop Praying, Start Balancing.

It starts with a trickle. Maybe you got linked on a popular forum, or perhaps your marketing team actually did their job for once. Suddenly, Nagios is sending you SMS alerts at 3:00 AM. Load average is 25.0. Apache has hit MaxClients. Your server is swapping so hard the disk heads are about to detach.

If you are running a high-traffic site on a single box in 2009, you are gambling. And the house always wins.

I learned this the hard way deploying a large vBulletin forum last month. We tried to scale vertically—threw more RAM at the problem, upgraded to dual-core CPUs—but the MySQL locks and Apache process overhead eventually choked the OS. The solution isn't a bigger server. It's more servers.

The Architecture: HAProxy as the Traffic Cop

Forget expensive hardware load balancers like F5 Big-IP. Unless you have a budget the size of Statoil, they are overkill. For the rest of us, there is HAProxy. It is lightweight, incredibly stable, and can push thousands of concurrent connections on modest hardware.

We are going to set up a Layer 4/7 load balancer using HAProxy 1.3 on CentOS 5.3. This will sit in front of two (or more) web servers (Web A and Web B).

Prerequisites

  • One Load Balancer Node: A small CoolVDS instance (256MB RAM is plenty for HAProxy).
  • Two Web Nodes: Where your PHP/Python apps live.
  • Private Networking: Critical. You don't want backend traffic traversing the public internet.
Pro Tip: Avoid OpenVZ for load balancers. In a shared kernel environment, network stack limitations (beancounters) can cause packet drops under high load. We strictly use Xen virtualization at CoolVDS to ensure your network buffers are actually yours.

Installation and Config

The repositories often lag behind. Don't use `yum install haproxy` if it fetches version 1.1. We want 1.3 for the stability improvements. Compile it from source.

wget http://haproxy.1wt.eu/download/1.3/src/haproxy-1.3.22.tar.gz tar xzvf haproxy-1.3.22.tar.gz cd haproxy-1.3.22 make TARGET=linux26 sudo make install

Now, let's configure /etc/haproxy.cfg. We will use the roundrobin algorithm, which simply passes requests to servers in turns. It’s simple and effective for stateless web clusters.

global maxconn 4096 user haproxy group haproxy daemon defaults mode http retries 3 option redispatch maxconn 2000 contimeout 5000 clitimeout 50000 srvtimeout 50000 listen webfarm 0.0.0.0:80 mode http stats enable stats auth admin:secretpassword balance roundrobin option httpclose option forwardfor server web01 10.0.0.2:80 check server web02 10.0.0.3:80 check

Breaking Down the Config

option httpclose: This is vital. It tells HAProxy to close the connection to the server after the response. Without this, Apache keep-alives can fill up the connection slots on the load balancer.

option forwardfor: Since the web servers only see the load balancer's IP, this adds the X-Forwarded-For header so your logs show the real visitor IP.

Why Infrastructure Matters

Software configuration is only half the battle. The underlying hardware defines your ceiling. When you split your database and web servers, latency becomes your enemy. Every millisecond of delay between the Load Balancer and the Web Node adds up.

This is where provider choice hits your bottom line. At CoolVDS, our Oslo datacenter is connected directly to the NIX (Norwegian Internet Exchange). Furthermore, our internal network uses gigabit switches, ensuring that the hop from the Load Balancer to the Web Node is practically instant.

Also, let's talk about storage. If your web nodes are logging heavily or serving static assets, slow disks will cause I/O wait, making the load balancer think the server is down. We use enterprise-grade 15k RPM SAS RAID10 arrays. They aren't cheap, but they don't choke when traffic spikes.

Compliance: The Norwegian Context

Hosting in Norway isn't just about latency; it's about the law. Under the Personopplysningsloven (Personal Data Act), you are responsible for where your user data lives. If you are serving Norwegian customers, keeping data within national borders simplifies compliance with the Data Protection Authority (Datatilsynet). Don't risk Safe Harbor complications by dumping everything on a US server.

The Verdict

One server is a single point of failure. Two servers is a cluster. HAProxy is the glue that makes it work. It’s not flashy, it doesn't have a GUI, and it requires you to know your way around vi. But it works.

Ready to stop sweating every time your traffic graphs go up? Spin up a Xen-based instance on CoolVDS today. We offer the raw performance and root access you need to build a real architecture, not just a shared hosting toy.