Console Login

Surviving the Slashdot Effect: High-Availability Load Balancing with HAProxy 1.4

Surviving the Slashdot Effect: High-Availability Load Balancing with HAProxy 1.4

It starts with a trickle. Then a link hits Reddit or Digg. Suddenly, your load average spikes to 25.0, Apache spawns child processes until it hits MaxClients, and your server starts swapping itself to death. If you are running a single server setup, you are sitting on a ticking time bomb.

I’ve seen too many competent sysadmins try to tune `my.cnf` or tweak Apache prefork settings to squeeze more juice out of a single box. While optimization is noble, scalability is physics. Eventually, you run out of RAM.

The solution isn't a bigger server; it's more of them. Today, we are going to look at how to deploy HAProxy 1.4 (released just this February) to distribute traffic across multiple web nodes. We will focus on a setup that ensures stability, maintains session stickiness, and keeps your Norwegian users happy with low latency.

Why HAProxy?

In the enterprise world, managers love throwing money at F5 Big-IP hardware appliances. But for lean startups and dev teams, HAProxy is the industry standard for software load balancing. It is open-source, incredibly stable, and capable of handling tens of thousands of concurrent connections without eating your CPU.

Unlike Nginx, which is primarily a web server that can proxy, HAProxy is a dedicated TCP/HTTP load balancer. It strips the overhead.

The Architecture

We are going to move from a single point of failure to a redundant cluster:

  • Load Balancer (LB1): HAProxy running on a lightweight VPS.
  • Web Nodes (Web1, Web2): Apache/PHP backend servers.
  • Database: MySQL Master/Slave (outside the scope of this article, but assumed).

Configuring HAProxy 1.4

First, install HAProxy. On a standard CentOS 5 box, you might need the EPEL repository, or better yet, compile from source to get the latest 1.4 features.

wget http://haproxy.1wt.eu/download/1.4/src/haproxy-1.4.4.tar.gz
tar xzfv haproxy-1.4.4.tar.gz
cd haproxy-1.4.4
make TARGET=linux26
make install

Here is a battle-tested /etc/haproxy/haproxy.cfg configuration. This setup uses the roundrobin algorithm but adds cookie-based persistence—crucial for PHP applications where losing your session means logging the user out.

global
    log 127.0.0.1   local0
    maxconn 4096
    user haproxy
    group haproxy
    daemon

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    retries 3
    option redispatch
    maxconn 2000
    contimeout 5000
    clitimeout 50000
    srvtimeout 50000

listen webfarm 0.0.0.0:80
    mode http
    stats enable
    stats uri /haproxy?stats
    balance roundrobin
    cookie JSESSIONID prefix
    option httpclose
    option forwardfor
    server web1 192.168.1.10:80 cookie A check
    server web2 192.168.1.11:80 cookie B check
Pro Tip: notice the option forwardfor line? Without this, your Apache logs will show the load balancer's IP for every request, making analytics useless. This inserts the X-Forwarded-For header so you can see the real visitor IP.

The Importance of I/O and Latency

Software configuration is only half the battle. The underlying infrastructure dictates your ceiling. In Norway, latency to the Norwegian Internet Exchange (NIX) is a critical metric. If your datacenter routes traffic through Frankfurt to reach a user in Oslo, you are adding 30-40ms of unnecessary lag.

Furthermore, load balancers are I/O sensitive when logging high traffic volumes. While standard SATA drives are cheap, they bottleneck quickly under random write operations.

This is where CoolVDS differs from the budget providers overselling OpenVZ containers. We utilize enterprise-grade RAID-10 SAS storage (and are currently rolling out early-adopter SSD storage tiers) to ensure that disk wait times don't stall your request queue. When you are balancing 5,000 requests per second, "noisy neighbors" on a shared host can kill your throughput.

Legal Compliance in 2010

We are also seeing stricter enforcement from Datatilsynet regarding where personal data is stored. Under the Personal Data Act (Personopplysningsloven), ensuring your physical servers are located within the EEA (or specifically Norway) simplifies compliance significantly compared to hosting on US-based clouds.

Testing the Failover

Don't assume it works. Break it.

With the config above active, run a tail on your logs and stop Apache on web1:

[root@web1 ~]# /etc/init.d/httpd stop

Check the HAProxy stats page defined in your config (/haproxy?stats). You should see web1 turn red (DOWN), and all traffic immediately shift to web2. No downtime. No 503 errors for your customers. Just resilience.

Next Steps

Scaling horizontally removes the single point of failure from your web tier, but remember that your load balancer itself eventually needs redundancy (using Heartbeat and a Floating IP). Start small.

If you need a sandbox to test this compilation and configuration, don't risk your production environment. VPS Norway options are limited, but you need a provider that offers true KVM or Xen virtualization to run custom kernels.

Ready to build a cluster that doesn't sleep? Deploy a CoolVDS instance with pure SSD storage today and see the difference raw I/O makes.