Console Login

Surviving the C10k Problem: Tuning Nginx and CentOS 6 for High-Traffic APIs

Surviving the C10k Problem: Tuning Nginx and CentOS 6 for High-Traffic APIs

It starts with a few timeouts. Then the latency graphs on your Nagios dashboard start looking like the Oslo skyline. Before you know it, your API gateway has collapsed, and your developers are blaming the database while the DBAs are blaming the network. I've been there. Just last week, I had to rescue a media streaming startup whose SOAP API crumbled under just 2,000 concurrent connections. They were running default Apache configurations on oversubscribed spinning disks. It was a massacre.

The truth is, most "managed" VPS providers in Europe hand you a default CentOS installation that is tuned for a file server, not a high-concurrency API gateway. If you are serving JSON or XML to mobile clients, you are not bound by bandwidth; you are bound by packets per second (PPS) and context switching.

Here is how we tune the Linux stack and Nginx to handle the load, based on the reference architecture we use at CoolVDS.

1. The OS Layer: Stop Suffocating Your TCP Stack

By default, Linux is conservative. It assumes you want to save RAM. But RAM is cheap; downtime is expensive. The first bottleneck you will hit is the file descriptor limit. In Linux, every TCP connection is a file. If you have a default limit of 1024, your 1025th user gets a Connection Refused error.

First, check your current limits:

[root@api-gw01 ~]# ulimit -n
1024

That is pathetic. Let's fix this permanently. Edit /etc/security/limits.conf and add:

*       soft    nofile  65535
*       hard    nofile  65535
root    soft    nofile  65535
root    hard    nofile  65535

Next, we need to tune the kernel via sysctl. The most painful issue with API gateways is running out of ephemeral ports because connections stay in the TIME_WAIT state too long. You need to tell the kernel to reuse these sockets faster.

Add these lines to /etc/sysctl.conf. This is the exact config we deploy on our high-performance CoolVDS KVM instances:

# Increase system file descriptor limit
fs.file-max = 2097152

# TCP Stack Tuning for Burstiness
net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1

# Keepalive optimization (crucial for mobile clients)
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_intvl = 15

# Buffer sizes for handling heavy throughput
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

Apply it with sysctl -p. If you are on a shared kernel environment like OpenVZ (which many budget hosts use), these settings might fail or be ignored. This is why we strictly use KVM virtualization at CoolVDS; you need your own kernel to do this right.

2. Nginx: The Engine Configuration

Apache is great for serving PHP, but for an API gateway, the process-based model is too heavy. You need Nginx 1.2.x or 1.4.x (Stable). The event-driven architecture allows it to handle thousands of connections with a tiny memory footprint.

The mistake I see most often is leaving the worker_processes to auto or 1, and not using epoll.

Here is a battle-tested nginx.conf snippet for an 8-core server:

user nginx;
worker_processes 8; # Match your CPU cores
worker_rlimit_nofile 65535;

events {
    use epoll;
    worker_connections 8192;
    multi_accept on;
}

http {
    # ... standard MIME types ...

    # Hides Nginx version to confuse script kiddies
    server_tokens off;

    # Optimization for file sending (less relevant for API, but good for assets)
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;

    # Logging: Disable access logs for high-traffic endpoints to save I/O
    # or buffer them.
    access_log /var/log/nginx/access.log combined buffer=32k;

    # Upstream Keepalive
    upstream backend_api {
        server 10.0.0.5:8080;
        keepalive 64;
    }

    server {
        listen 80 default_server;
        
        location /api/v1/ {
            proxy_pass http://backend_api;
            proxy_http_version 1.1;
            proxy_set_header Connection "";
            proxy_set_header X-Real-IP $remote_addr;
        }
    }
}
Pro Tip: Notice the proxy_set_header Connection ""; inside the location block? This is mandatory if you use upstream keepalive. Without it, Nginx defaults to closing the connection to your backend app (Node.js, PHP-FPM, or Tomcat), forcing your backend to open a new socket for every single request. That overhead kills latency.

3. The Hardware Reality Check: I/O Wait

You can have the most optimized Nginx config in the world, but if your server is waiting on the disk, your API will lag. In 2013, data is getting heavier. If your application logs heavily or uses a file-based cache, standard 7200 RPM SATA drives are insufficient.

We ran a benchmark comparing a standard VPS against a CoolVDS instance backed by an Enterprise SSD RAID array. The test involved a JMeter script simulating 500 concurrent users hitting a REST endpoint that wrote a transaction log.

Metric Standard SATA VPS CoolVDS SSD Instance
Avg Latency 340ms 45ms
I/O Wait (CPU) 15-20% < 1%
Transactions/Sec 120 850

The difference is stark. High IOPS (Input/Output Operations Per Second) are critical. While some providers are just starting to experiment with caching tier solutions, we provide pure flash storage. While the industry buzz is moving toward PCIe-based flash (often called NVMe storage in technical circles), simply moving from rotating rust to solid-state is the single biggest upgrade you can make for database-heavy APIs.

4. Local Latency and Compliance

If your user base is in Norway, hosting in Germany or the US is a mistake. The latency from Oslo to Frankfurt is decent (~30ms), but the latency from Oslo to a local datacenter connected to NIX (Norwegian Internet Exchange) is under 5ms. For an API making multiple round-trips, that adds up.

Furthermore, with the Data Protection Authority (Datatilsynet) becoming stricter about where personal data resides, keeping your user data within Norwegian borders simplifies your legal compliance significantly. You don't want to deal with Safe Harbor complexities if you don't have to.

Summary

Performance isn't magic; it's engineering. By lifting the default limits in CentOS 6, enabling epoll in Nginx, and ensuring your underlying storage isn't a bottleneck, you can scale to thousands of requests per second on modest hardware.

Don't let slow I/O kill your SEO or frustrate your mobile users. If you need a sandbox to test these configs, deploy a test instance on CoolVDS in 55 seconds. We offer the low latency and raw SSD power your code deserves.