Surviving the Digg Effect: High-Performance Load Balancing with HAProxy 1.3
It starts with a slow page load. Then the connection timeouts begin. Finally, your SSH session lags, and you realize your primary Apache server has hit MaxClients and is eating swap space like it's a free buffet. If you run a high-traffic site in Norway, you know the drill. The "Digg Effect" isn't just a buzzword; it's a server killer.
Most sysadmins try to solve this by throwing more RAM at the problem or tweaking httpd.conf until their eyes bleed. But in 2009, vertical scaling hits a wall—and it hits it hard. The smarter solution isn't a bigger server; it's a smarter architecture.
Enter HAProxy. While Nginx is making waves as a web server, HAProxy remains the undisputed king of software load balancing. Here is how to use it to stop your servers from melting.
The Bottleneck: Why Apache Fails
The standard LAMP stack (Linux, Apache, MySQL, PHP) is robust, but Apache's prefork MPM is memory-hungry. Every client connection spawns a process. If you have 500 simultaneous users on a slow connection (like 3G mobile data), Apache holds 500 heavy processes open, waiting for data. Your RAM vanishes.
HAProxy sits in front of your web servers. It buffers the connections, speaks to the slow clients, and only sends requests to Apache when the request is fully formed. It turns a concurrency problem into a simple pipeline problem.
Configuration: The "Battle-Tested" Setup
I recently deployed this setup for a Norwegian media outlet covering the election. We moved from a single crashing server to a pair of CoolVDS Xen instances fronted by HAProxy. The result? Zero downtime.
Here is a production-ready haproxy.cfg snippet compatible with version 1.3.17 (stable):
global
log 127.0.0.1 local0
maxconn 4096
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
retries 3
option redispatch
maxconn 2000
contimeout 5000
clitimeout 50000
srvtimeout 50000
listen webfarm 0.0.0.0:80
mode http
stats enable
stats uri /haproxy?stats
balance roundrobin
option httpclose
option forwardfor
cookie SERVERID insert indirect nocache
server web01 10.0.0.1:80 cookie A check inter 2000 rise 2 fall 5
server web02 10.0.0.2:80 cookie B check inter 2000 rise 2 fall 5
Breaking Down the Config
- balance roundrobin: Distributes traffic equally. If you have one beefier server, use
weightparameters. - option httpclose: Critical for PHP. It tells HAProxy to close the connection to the backend server as soon as the transfer is done, freeing up that Apache slot immediately.
- option forwardfor: This adds the
X-Forwarded-Forheader so your Apache logs show the real client IP, not the load balancer's IP.
Pro Tip: Don't forget to adjust yoursysctl.conf. Increasenet.ipv4.ip_local_port_rangeto1024 65000to avoid running out of ephemeral ports during high load. Default Linux settings are too conservative for load balancers.
Hardware Matters: The I/O Reality
Software optimization can only save you so much. Even with HAProxy, if your backend database is thrashing on a slow hard drive, your site will feel sluggish. This is where the underlying infrastructure becomes paramount.
Many budget VPS providers in Europe are still overselling standard 7.2k RPM SATA drives. In a virtualized environment, "noisy neighbors" can steal your disk I/O, causing MySQL queries to pile up. This is the silent killer of performance.
This is why for serious deployments, we use CoolVDS. We utilize enterprise-grade 15k RPM SAS drives in RAID-10 arrays. While SSDs like the Intel X25-E are just starting to enter the enterprise market (and cost a fortune), a well-tuned SAS RAID-10 array offers the highest reliable IOPS available today for database workloads.
Data Sovereignty and Latency
If your primary audience is in Norway, hosting in the US or even Germany adds unnecessary latency. Packets have to travel through multiple hops. By hosting on CoolVDS, you are sitting directly on the infrastructure connected to NIX (Norwegian Internet Exchange). Ping times to Oslo are often in the single digits.
Furthermore, with the Norwegian Personal Data Act (Personopplysningsloven) and the EU Data Protection Directive (95/46/EC), keeping your customer data within national borders simplifies compliance significantly. You don't want to deal with the legal headache of Safe Harbor data transfers if you don't have to.
Final Verdict
You don't need a cluster of 20 physical servers to handle traffic spikes. You need a lightweight entry point. HAProxy 1.3 on a small VPS, distributing traffic to backend application servers, is the most cost-effective way to scale in 2009.
Stop letting MaxClients determine your uptime. Spin up a CoolVDS instance, install HAProxy via yum install haproxy, and watch your load averages drop.