Surviving the Slashdot Effect: High Availability with HAProxy on CentOS
It starts with a slow page load. Then, the connection times out. Finally, your monitoring system starts screaming because Apache has consumed every bit of RAM and Swap on your server. If you are running a single LAMP stack, you are sitting on a ticking time bomb. I've seen it happen too many times: a marketing campaign goes viral, or you get linked on a high-traffic site, and suddenly your revenue stream is dead because your server is thrashing.
Stop throwing money at RAM. It doesn’t scale linearly. The solution isn't a bigger server; it's smart architecture.
Today, we are going to set up HAProxy 1.4 (High Availability Proxy) to split traffic across multiple web nodes. This is the exact setup I recommend for clients who need to handle spikes without downtime.
Why HAProxy?
You could use Nginx as a balancer, but HAProxy is a dedicated scalpel for this surgery. It handles connection regulation better than anything else in the open-source world right now. It is purely event-driven, meaning it doesn't spawn a process per connection like Apache prefork does. It sips CPU while pushing thousands of concurrent connections.
The Architecture
We are going to move from a single point of failure to a redundant setup:
- Node 1 (LB): HAProxy (Public IP)
- Node 2 (Web A): Apache/PHP (Private Network)
- Node 3 (Web B): Apache/PHP (Private Network)
Using CoolVDS instances for this is ideal because we provide a distinct private network backend (vLAN), reducing latency between your balancer and your web heads to almost zero.
The Configuration
Assuming you are on CentOS 5.5 or the new Debian 6 (Squeeze), install HAProxy. Once installed, back up the default config and open /etc/haproxy/haproxy.cfg. We aren't going to use the default fluff.
Here is a battle-tested configuration for HTTP traffic:
global
log 127.0.0.1 local0
maxconn 4096
user haproxy
group haproxy
daemon
# Spreading checks prevents CPU spikes on the LB
spread-checks 5
defaults
log global
mode http
option httplog
option dontlognull
retries 3
option redispatch
maxconn 2000
contimeout 5000
clitimeout 50000
srvtimeout 50000
listen webfarm 0.0.0.0:80
mode http
stats enable
stats auth admin:password123
stats uri /haproxy?stats
balance roundrobin
option httpclose
option forwardfor
option httpchk HEAD /check.txt HTTP/1.0
server web01 192.168.1.10:80 check inter 2000 rise 2 fall 3
server web02 192.168.1.11:80 check inter 2000 rise 2 fall 3
Key Directives Explained
balance roundrobin: This distributes requests sequentially. Web01, then Web02, then Web01. Simple and effective for stateless apps.option httpchk: This is critical. HAProxy checks if a specific file exists. If Apache is up but PHP is broken, a TCP check might pass, but an HTTP check will fail, correctly taking the node out of rotation.option forwardfor: Since the web servers only see the Load Balancer's IP, this adds theX-Forwarded-Forheader so your logs show the real visitor IP.