Console Login
Home / Blog / Systems Engineering / Scaling Past the Digg Effect: High-Availability Load Balancing with HAProxy 1.3
Systems Engineering 1 views

Scaling Past the Digg Effect: High-Availability Load Balancing with HAProxy 1.3

@

Scaling Past the Digg Effect: High-Availability Load Balancing with HAProxy 1.3

It starts with a trickle. Then you get featured on Digg or Slashdot. Suddenly, your Apache MaxClients limit is hit, your load average spikes to 50.0, and your server stops responding. If you are running a single box setup in 2009, you are playing Russian Roulette with your uptime.

I have seen seasoned systems administrators cry when their perfectly tuned httpd.conf fails under a legitimate DDoS of traffic. The solution isn't just "buy a bigger server." Vertical scaling hits a wall—usually an I/O wall. The solution is horizontal scaling, and for that, you need a load balancer that doesn't buckle under pressure.

Enter HAProxy. While hardware load balancers like F5 Big-IP cost more than a luxury car, HAProxy does the same job for free, often faster, on standard Linux hardware.

Why Apache mod_proxy Isn't Enough

Many admins try to use Apache's mod_proxy first. It works, but it's heavy. Apache spawns processes or threads for every connection. If you have 5,000 slow clients, you have 5,000 Apache processes eating RAM. HAProxy, on the other hand, is an event-driven engine. It can handle 10,000 concurrent connections without breaking a sweat, using a fraction of the memory.

The Architecture: Keeping It Simple

For a robust setup targeting Norwegian users, latency matters. We want the termination point close to the NIX (Norwegian Internet Exchange) in Oslo. Here is the topology I deployed last week for a high-traffic media site:

  • Load Balancer: 1x CoolVDS Xen Instance (CentOS 5, HAProxy 1.3.15)
  • Web Cluster: 3x CoolVDS Web Nodes (Lighttpd + PHP-FastCGI)
  • Database: 1x Dedicated SQL Node (MySQL 5.0, RAID-10 SAS)

Configuring HAProxy 1.3 for Layer 7 Balancing

Layer 7 balancing allows us to inspect HTTP headers and make routing decisions. This is critical if you want to route static content to one set of servers and PHP requests to another, or simply balance based on cookies.

Here is a battle-tested /etc/haproxy/haproxy.cfg configuration:

global
    log 127.0.0.1   local0
    maxconn 4096
    user haproxy
    group haproxy
    daemon

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    retries 3
    option  redispatch
    maxconn 2000
    contimeout      5000
    clitimeout      50000
    srvtimeout      50000

frontend http_in
    bind *:80
    # ACL to detect static content
    acl url_static path_end .jpg .gif .png .css .js
    use_backend static_farm if url_static
    default_backend app_farm

backend app_farm
    balance roundrobin
    cookie SERVERID insert indirect nocache
    option httpchk HEAD /health_check.php HTTP/1.0
    server web01 192.168.1.10:80 cookie A check inter 2000 rise 2 fall 5
    server web02 192.168.1.11:80 cookie B check inter 2000 rise 2 fall 5

backend static_farm
    balance source
    server static01 192.168.1.20:80 check
Pro Tip: Notice the cookie SERVERID insert directive. This ensures session persistence. If a user logs into your PHP application on web01, HAProxy injects a cookie to ensure they keep talking to web01. Without this, your users will get logged out every time the load balancer switches nodes.

The Hardware Reality: Virtualization vs. "Fake" VPS

In 2009, not all Virtual Private Servers are created equal. Many budget hosts use container-based tech like Virtuozzo where the kernel is shared. If your neighbor gets Slashdotted, your load balancer slows down because of context switching and resource contention.

This is why for critical infrastructure, I stick to Xen virtualization, which CoolVDS uses exclusively. Xen provides true isolation. The memory you are allocated is yours. The CPU cycles are guaranteed. When you are configuring maxconn 4096 in HAProxy, you need to know the underlying OS can actually handle the interrupt load.

Data Integrity and Norwegian Law

We are seeing stricter enforcement from Datatilsynet (The Norwegian Data Inspectorate) regarding the Personal Data Act (Personopplysningsloven). Hosting your load balancer and database outside the EEA can introduce legal headaches regarding data export.

By keeping your infrastructure in Oslo, you solve two problems:

  1. Compliance: Data stays within Norwegian jurisdiction, satisfying the strictest interpretations of the Personal Data Act.
  2. Latency: Round-trip time (RTT) from a user in Trondheim to a server in Texas is ~140ms. To Oslo? ~15ms. TCP handshakes are faster, and the site feels "snappier."

Monitoring Your Cluster

A load balancer is a black box if you don't watch it. HAProxy 1.3 includes a nifty stats page. Enable it by adding this to your config:

listen admin_stats
    bind *:8080
    stats uri /haproxy?stats
    stats realm Global\ Statistics
    stats auth admin:securepassword

This gives you a real-time dashboard of session rates, errors, and server health. If web01 dies, you'll see it turn red here instantly, and HAProxy will automatically stop sending it traffic.

Conclusion

If you are serious about high availability, you need to decouple your traffic ingress from your application logic. HAProxy is the industry standard for a reason. It is predictable, stable, and incredibly efficient.

However, software is only as good as the platform it runs on. Avoid oversold hosting that steals your CPU cycles. For a setup like the one described above, I recommend starting with a CoolVDS Xen instance with reliable RAID-10 SAS storage. It gives you the raw I/O performance needed to handle logging and state management without the bottlenecks of shared filesystems.

Ready to stabilize your stack? Deploy a CoolVDS Xen instance today and configure your first load balancer in under 10 minutes.

/// TAGS

/// RELATED POSTS

Scaling Past the C10k Problem: Real-World Load Balancing Strategies

Is your single Apache instance choking on traffic? We break down proven load balancing techniques fr...

Read More →
← Back to All Posts