Scaling Out: Surviving the Slashdot Effect with HAProxy on Linux
It starts with a creeping load average. You check top and see your CPU wait states climbing. Then, the connections start timing out. Your single LAMP server, which ran fine last week, just hit a traffic spike—maybe you got featured on Digg or Slashdot—and now Apache is spawning child processes until the server runs out of RAM and swaps itself to death.
I see this every week. The instinct is to tune my.cnf or increase MaxClients in Apache. But software tuning can only take you so far when you are bound by the physical limits of a single chassis.
The solution isn't a bigger server; it's more servers. Today, we are going to set up a software load balancer using HAProxy 1.3. It costs nothing compared to a hardware F5 BigIP, and when deployed on a clean network like the one we have at CoolVDS, it pushes packets faster than you can generate them.
Why HAProxy?
In the enterprise world, managers love spending $20,000 on hardware load balancers. In the real world of agile startups and lean DevOps, we use HAProxy. It is a single-threaded, event-driven engine that handles thousands of concurrent connections without eating your memory.
Unlike Apache, which uses a thread or process for every connection, HAProxy uses an event loop (similar to the emerging Nginx web server). This means it can sit in front of your web farm and distribute traffic with barely any latency.
Pro Tip: Never run your load balancer on the same physical disk or VPS as your database. The I/O contention will kill performance. On CoolVDS, we use Xen virtualization to ensure your memory and disk I/O are strictly isolated from neighbors, preventing the "noisy neighbor" syndrome common with Virtuozzo hosts.
The Architecture
We will configure a simple Layer 7 load balancer. We assume you have three VPS instances running CentOS 5:
- LB01: The Load Balancer (Public IP)
- WEB01: Apache Web Server (Private Network)
- WEB02: Apache Web Server (Private Network)
Configuration: The Meat and Potatoes
First, install HAProxy. On CentOS, it's in the extras repository, or you can compile from source for the latest 1.3 stable release.
yum install haproxy
Now, let's look at /etc/haproxy/haproxy.cfg. Most defaults are garbage. Here is a battle-tested config I used recently for a high-traffic Norwegian news aggregator:
global
log 127.0.0.1 local0
maxconn 4096
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
retries 3
option redispatch
maxconn 2000
contimeout 5000
clitimeout 50000
srvtimeout 50000
listen webfarm 0.0.0.0:80
mode http
stats enable
stats auth admin:password
balance roundrobin
option httpclose
option forwardfor
server web01 192.168.1.10:80 check
server web02 192.168.1.11:80 check
Breaking Down the Config
The balance roundrobin directive is key. It rotates requests evenly between WEB01 and WEB02. If WEB01 dies (Apache segfaults or the kernel panics), the check parameter notices the port is down and HAProxy instantly stops sending traffic there. Zero downtime for the user.
The option forwardfor is critical. Without it, your web servers only see the IP address of the load balancer. This header passes the real client IP through, so your access logs (and analytics) remain accurate.
Latency and Geography Matter
You can have the best configuration in the world, but if your pipes are clogged, it doesn't matter. Latency kills user experience.
For clients targeting the Nordic market, physics is the enemy. Hosting in the US means a minimum 120ms round-trip time (RTT). Hosting in Germany drops that to 30ms. But hosting in Norway, directly peered at NIX (Norwegian Internet Exchange) in Oslo, drops latency to single digits for local users.
This is why we built CoolVDS in Oslo. We don't just resell capacity; we optimize for the local routing table. Plus, keeping data inside Norwegian borders simplifies compliance with the Personopplysningsloven (Personal Data Act) and keeps the Datatilsynet happy. You don't want to explain to a client why their customer data is sitting on a server subject to the US Patriot Act.
Storage I/O: The Hidden Bottleneck
While HAProxy runs largely in RAM, your web servers need to read PHP files and your database needs to write records. Most budget VPS providers shove you onto SATA drives with 50 other users. When one user runs a backup, your site crawls.
We use enterprise-grade RAID-10 SAS arrays and are currently testing the new Intel X25-E SSDs for database hosting. High I/O storage is the only way to keep up with the request rate that HAProxy permits.
Final Thoughts
Don't wait for your server to crash during a marketing campaign. Spin up a small load balancer instance today. It gives you redundancy, scalability, and peace of mind.
If you need a test environment that mimics production hardware, grab a slice on CoolVDS. We offer strict resource guarantees, meaning your RAM is yours, and your CPU cycles aren't stolen by neighbors. Deploy a VPS in Norway today and see the difference low latency makes.