Console Login

Scaling Past Single-Server Limits: High Availability with HAProxy on CentOS 6

Scaling Past Single-Server Limits: High Availability with HAProxy on CentOS 6

It is 3:00 AM. Your monitoring system is screaming. Your primary web server just locked up because a marketing campaign went viral, and the Apache child processes consumed every megabyte of RAM. You are now rebooting a server while losing transactions. If this sounds familiar, your architecture is wrong.

In 2012, relying on vertical scaling—just adding more RAM or CPU to a single box—is a dead end. The future is horizontal. The solution isn't a bigger server; it's smarter traffic distribution.

I have spent the last month migrating a high-traffic e-commerce client based here in Oslo from a monolithic dedicated server to a distributed VPS cluster. The tool of choice? HAProxy. Unlike Nginx, which is a fantastic web server, HAProxy is a dedicated TCP/HTTP load balancer. It strips out the complexity and focuses purely on shuffling packets with ruthless efficiency.

The Architecture: Removing the Bottleneck

A standard setup involves one Load Balancer (LB) distributing traffic to two or more Web Nodes. However, if the LB dies, the site dies. That is not High Availability (HA).

To build a truly bulletproof infrastructure on CoolVDS, we use a Floating IP (VIP) setup with Keepalived (VRRP). If the master LB fails, the backup takes over the IP address instantly. No DNS propagation. No downtime.

Pro Tip: Many budget VPS providers block Multicast traffic, which breaks VRRP (the protocol Keepalived uses). CoolVDS network architecture is built with proper switching isolation, allowing VRRP packets to pass correctly between your instances in our Oslo datacenter.

Prerequisites

  • 2x CoolVDS Instances (CentOS 6.2) for Load Balancers.
  • 2x CoolVDS Instances for Web Servers (running Apache or Nginx).
  • Root access.

Step 1: Installing HAProxy 1.4

CentOS 6 repositories include HAProxy, but it is often an older build. I recommend enabling the EPEL repository or compiling from source if you need specific 1.4.x features. For this guide, the standard package is sufficient for 90% of use cases.

yum install haproxy chkconfig haproxy on

Step 2: The Configuration

The default config is garbage. Backup /etc/haproxy/haproxy.cfg and create a new one. We are going to configure this for Layer 7 load balancing (HTTP mode), which allows us to route traffic based on cookies or headers.

Here is a production-ready configuration I use for handling traffic spikes:

global
    log         127.0.0.1 local2
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon
    # Spreading checks helps avoid spiking CPU on backends simultaneously
    stats socket /var/lib/haproxy/stats

defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000

frontend  main *:80
    acl url_static       path_beg       -i /static /images /javascript /stylesheets
    acl url_static       path_end       -i .jpg .gif .png .css .js

    default_backend             app

backend app
    balance     roundrobin
    # Sticky sessions via cookie, essential for PHP sessions
    cookie      SERVERID insert indirect nocache
    option      httpchk GET /health_check.php
    server  web1 192.168.10.11:80 check cookie s1
    server  web2 192.168.10.12:80 check cookie s2

Understanding the Config

The balance roundrobin directive ensures requests are distributed evenly. However, for applications utilizing PHP sessions (like Magento or Drupal), users must stay on the same server during their session. The cookie SERVERID insert directive handles this by injecting a cookie into the client's browser.

The option httpchk is critical. HAProxy shouldn't just check if port 80 is open; it should check if the application is actually rendering. Create a file named health_check.php on your web nodes that returns a simple "200 OK" only if the local database connection is active. If the DB fails on Node 1, HAProxy automatically pulls it from rotation.

Step 3: Kernel Tuning for High Concurrency

Linux defaults are conservative. When you start pushing thousands of connections per second, you will run out of ephemeral ports. Edit /etc/sysctl.conf to widen the range and enable faster recycling of TIME_WAIT sockets.

# Allow binding to Non-Local IP (Essential for Keepalived VIP)
net.ipv4.ip_nonlocal_bind = 1

# Expand the local port range
net.ipv4.ip_local_port_range = 1024 65535

# Recycle TCP connections faster
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 15

Apply these changes with sysctl -p.

The Storage Factor: Why I/O Matters

Load balancing solves CPU bottlenecks, but it exposes I/O weaknesses. If you split traffic to two web servers, but they are both reading from slow mechanical hard drives, you have just doubled your latency problem.

This is where infrastructure choice dictates success. In our tests, switching from standard SATA VPS providers to CoolVDS SSD instances reduced initial page load times by 400ms. We use pure Enterprise SSDs in RAID arrays, not cached SATA. When your HAProxy logs show high Tw (Wait time in queue), it is almost always slow disk I/O on the backend.

Legal Nuances: The Norwegian Context

Hosting in Norway involves strict adherence to the Personal Data Act (Personopplysningsloven). Unlike hosting in the US under the Patriot Act, data here is protected. However, you must ensure your logging config in HAProxy respects user privacy.

If you are logging option httplog, you are capturing IP addresses. Ensure your log retention policy aligns with Datatilsynet guidelines. We generally recommend rotating and anonymizing logs every 30 days unless you have a specific security requirement.

Monitoring Your Cluster

HAProxy includes a built-in monitoring dashboard. It is ugly, but functional. Add this to your configuration to enable it:

listen stats *:1936
    stats enable
    stats uri /
    stats hide-version
    stats auth admin:SuperSecretPassword

Now navigate to port 1936 on your LB IP. You will see real-time health status of every backend server. Green is good. Red means you need to check your Apache logs.

Conclusion

Implementing HAProxy separates the amateurs from the professionals. It gives you the ability to perform maintenance on web servers during the day without downtime, simply by draining connections from the backend in the stats interface.

Building this on reliable infrastructure is the final piece of the puzzle. You need low latency to the NIX (Norwegian Internet Exchange) and hardware that doesn't steal CPU cycles from your load balancer.

Ready to build your cluster? Deploy a CoolVDS SSD instance in Oslo today. Benchmarking takes 5 minutes, but the stability lasts forever.