Console Login

Scaling Past the Single Server: A SysAdmin's Guide to HAProxy on CentOS 6

Scaling Past the Single Server: A SysAdmin's Guide to HAProxy on CentOS 6

There is a specific kind of panic that sets in when `top` shows your load average hitting 50.0 and Apache processes are spawning faster than your swap partition can swallow them. I faced this last November with a client running a Magento store targeting the Norwegian market. They launched a campaign on VG.no, traffic hit, and their single "high performance" dedicated server from a budget provider crumpled instantly. It wasn't the CPU; it was the I/O wait and the Apache `MaxClients` limit.

The solution wasn't a bigger server. It was horizontal scaling. If you are still running your database, web server, and caching layer on one box in 2012, you are designing for failure. Today, we are going to look at the industry standard for stabilizing traffic: HAProxy.

Why HAProxy 1.4?

Hardware load balancers like F5 Big-IP are great if you have the budget of a bank. For the rest of us, HAProxy (High Availability Proxy) is the weapon of choice. It is purely an event-driven, non-blocking engine. Unlike Apache, which threads or forks processes for connections, HAProxy uses a single-process, event-driven model. This allows it to handle 10,000 concurrent connections without eating gigabytes of RAM.

In our architecture, we place HAProxy in front of two or more web servers (Real Servers). It accepts the traffic and distributes it based on the algorithms we define.

The Infrastructure Prerequisite: Low Latency

Before touching config files, understand the physics. Splitting your stack introduces network latency. If your Load Balancer is in a datacenter in Germany and your web nodes are in Oslo, your application will feel sluggish. You need your nodes on the same switch or at least the same datacenter.

Pro Tip: When provisioning VPS instances for a cluster, ensure they share a private network interface (eth1). At CoolVDS, our instances in the Oslo zone communicate over a private gigabit backend, meaning your load balancer talks to your web nodes with near-zero latency, keeping that user experience snappy. Do not route backend traffic over the public internet.

Installation on CentOS 6.2

We will assume you are using CentOS 6.2 (the current stable workhorse). The default repositories might have an older version, so let's ensure we get the latest stable 1.4 branch.

[root@lb01 ~]# yum install haproxy
[root@lb01 ~]# haproxy -v
HA-Proxy version 1.4.18 2011/09/16
Copyright 2000-2011 Willy Tarreau 

If `yum` gives you something ancient, compile it from source. But for this guide, the repository version is sufficient.

The Configuration: /etc/haproxy/haproxy.cfg

Backup the default config immediately. We are building a fresh one. We need to define a frontend (where traffic hits) and a backend (where traffic goes).

Here is a battle-tested configuration for a standard PHP application:

global
    log         127.0.0.1 local2
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4096
    user        haproxy
    group       haproxy
    daemon
    # Spreading checks over time prevents spikes
    stats socket /var/lib/haproxy/stats

defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000

frontend  main_http_incoming
    bind *:80
    default_backend             app_servers

backend app_servers
    balance     roundrobin
    # The check is vital. It pings index.php to ensure the server is actually rendering.
    option httpchk HEAD /index.php HTTP/1.0
    server  web01 10.10.0.2:80 check
    server  web02 10.10.0.3:80 check

Key Directives Explained

  1. balance roundrobin: This distributes requests sequentially (Server A -> Server B -> Server A). For stateless apps, this is perfect. If you need sessions to stick (like a shopping cart), you might need `balance source` or `cookie` injection, though storing sessions in Memcached is the cleaner 2012 approach.
  2. option httpchk: TCP checks are not enough. A server can accept a TCP connection on port 80 but still return a 500 Internal Server Error. `httpchk` ensures the application is actually responding with a 200 OK.
  3. option forwardfor: Since HAProxy makes the request to the backend, your Apache logs will show the Load Balancer's IP (e.g., 10.10.0.1) for every visitor. This option adds the `X-Forwarded-For` header so you can see the real client IP in your logs.

Handling the "Noise": Logging and I/O

One oversight I see constantly is logging. HAProxy generates a lot of log data. If you are pushing 500 requests per second, and you are writing every line to disk, a standard mechanical hard drive (HDD) will choke. I/O wait shoots up, and the system slows down.

This is where hardware choice matters. You cannot run a serious load balancer on shared, oversold storage.

The SSD Advantage

At CoolVDS, we utilize enterprise-grade SSDs (Solid State Drives) for our storage pools. In 2012, many providers still treat SSDs as a luxury add-on, but for a load balancer or a database, they are mandatory. The random write speeds of an SSD prevent your log files from becoming a bottleneck during a DDoS or a traffic spike.

If you are stuck on spinning rust, configure Rsyslog to write asynchronously or send logs to a remote log server to save your I/O for traffic handling.

Validating the Config

Before restarting the service and dropping connections, always validate your syntax:

[root@lb01 ~]# haproxy -c -f /etc/haproxy/haproxy.cfg
Configuration file is valid

If valid, apply the changes:

[root@lb01 ~]# service haproxy reload

A Note on SSL/HTTPS

You will notice the config above handles port 80 (HTTP). HAProxy 1.4 does not support native SSL termination nicely on the client side without significant patching or instability. The robust way to handle HTTPS in 2012 is to place Stunnel or Nginx in front of HAProxy to handle the decryption, then pass plain HTTP to HAProxy.

Attempting to hack SSL directly into HAProxy 1.4 is usually more trouble than it is worth. Keep your architecture modular.

Summary

Moving from a single server to a cluster introduces complexity, but it buys you sleep. When one web node crashes, HAProxy detects it via `httpchk` and stops sending traffic there instantly. Your users in Oslo or Trondheim never notice a thing.

However, software is only half the equation. A load balancer running on a noisy, over-contended host will add jitter to every single request. For predictable latency and the raw I/O throughput needed for high-traffic logging, you need clean, dedicated resources.

Ready to stabilize your stack? Deploy a CoolVDS SSD instance in our Oslo datacenter today and give your HAProxy the foundation it deserves. Don't wait for the crash.