Console Login

API Gateway Performance Tuning: Squeezing the Kernel for Sub-Millisecond Latency

API Gateway Performance Tuning: Squeezing the Kernel for Sub-Millisecond Latency

Let's be honest: default Linux distributions are not designed for high-throughput API gateways. They are designed for general-purpose computing. If you deploy a standard CentOS 6 or Ubuntu 12.04 installation and expect it to handle 10,000 concurrent connections (C10k) without choking, you are delusional.

I recently audited a setup for a payment processor in Oslo. They were complaining about "network jitter" every time their traffic spiked above 500 requests per second. The culprit wasn't the network; it was the default file descriptor limits and a constipated TCP stack. They were running their API gateway on a legacy spinning-disk VPS with a provider that overcommitted RAM. The result? 500ms latency on a simple JSON handshake.

Today, we are going to fix that. We will strip away the safety rails and tune the Linux kernel and Nginx for raw speed. We aren't building a web server; we are building a packet cannon.

1. The Hardware Reality: Spindles are Dead

Before we touch a single config file, look at your storage. If your VPS provider is running on SAS 15k RPM drives, you have already lost. For an API gateway, I/O wait is the enemy. Logs, temporary files, and socket buffers need to move instantly.

At CoolVDS, we standardized on enterprise-grade SSDs and RAID-10 arrays for a reason. In our benchmarks, random write performance on SSDs is roughly 20-50x faster than traditional HDDs. When you have 5,000 API clients trying to open sockets simultaneously, that I/O throughput is the difference between a 200 OK and a timeout.

2. Kernel Tuning: `sysctl.conf` is Your Best Friend

Linux is polite by default. It waits to close connections. It protects you from using too much RAM. For a high-performance gateway, we need it to be aggressive.

Open /etc/sysctl.conf. We need to adjust how the kernel handles TCP connections. Specifically, we need to allow the system to reuse sockets in the TIME_WAIT state, otherwise, you will run out of ephemeral ports during traffic bursts.

# /etc/sysctl.conf

# Increase system-wide file descriptor limit
fs.file-max = 2097152

# Reuse sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1
# Note: tcp_tw_recycle is dangerous behind NAT/Load Balancers, use with caution
net.ipv4.tcp_tw_recycle = 0

# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535

# Increase TCP buffer sizes for 10Gbps+ networks
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# Max backlog for packet processing
net.core.netdev_max_backlog = 50000

# Max SYN backlog (connections waiting for ACK)
net.ipv4.tcp_max_syn_backlog = 30000

Apply these changes instantly:

sysctl -p

3. Nginx Configuration: Beyond the Defaults

Most people install Nginx via yum install nginx and walk away. That default config is meant for serving static HTML, not proxying thousands of API calls per second.

Here is a reference configuration for an API Gateway scenario using Nginx 1.4.x. We focus on keepalive connections to the upstream backend to avoid the overhead of opening a new TCP handshake for every single API request.

# /etc/nginx/nginx.conf
user nginx;
worker_processes auto; # Automatically detect cores
worker_rlimit_nofile 100000;

events {
    worker_connections 4096;
    use epoll; # Essential for Linux 2.6+
    multi_accept on;
}

http {
    # ... logs and mime types ...

    # OPTIMIZATION: Disable access logs for high-traffic API endpoints to save I/O
    # access_log off;

    # TCP optimization
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;

    # Keepalive to reduce CPU usage on handshakes
    keepalive_timeout 30;
    keepalive_requests 100000;

    upstream backend_api {
        server 10.0.0.5:8080;
        # IMPORTANT: Keepalive connections to backend
        keepalive 64;
    }

    server {
        listen 80;
        server_name api.yourservice.no;

        location / {
            proxy_pass http://backend_api;
            proxy_http_version 1.1;
            proxy_set_header Connection "";
            
            # Buffer tuning
            proxy_buffers 16 16k;
            proxy_buffer_size 32k;
        }
    }
}
Pro Tip: Check your current open file limits with ulimit -n. If it says 1024, your optimized config is useless. Ensure /etc/security/limits.conf allows the `nginx` user to open at least 65535 files.

4. The Virtualization Factor: KVM vs. Containers

In the hosting world, not all "clouds" are created equal. Many budget providers use OpenVZ or older container technologies where the kernel is shared among all tenants. This creates the "noisy neighbor" effect. If your neighbor's database starts thrashing the disk cache, your API latency spikes. You have no control over the kernel modules or deep `sysctl` parameters because you don't actually own the kernel.

This is why CoolVDS uses KVM (Kernel-based Virtual Machine). With KVM, you get your own dedicated kernel. You can load custom modules, tune the TCP stack exactly as I showed above, and rely on strict resource isolation. For an API gateway where consistency is key, full hardware virtualization is non-negotiable.

5. Local Context: Latency and Compliance

If your users are in Oslo, Bergen, or Trondheim, hosting in Frankfurt or London adds 20-40ms of unnecessary round-trip time (RTT). In the API world, that latency compounds on every request.

Furthermore, with the Data Inspectorate (Datatilsynet) becoming increasingly vigilant about where Norwegian data lives (especially regarding the Personal Data Act), keeping your infrastructure on Norwegian soil is a smart legal hedge. CoolVDS infrastructure is physically located in Oslo, directly peered at NIX (Norwegian Internet Exchange). This ensures your API responses stay within the country, minimizing hops and latency.

Final Thoughts

Performance isn't magic; it's engineering. By moving to SSD-backed storage, tuning your Linux kernel for concurrency, and ensuring you have true virtualization isolation, you can handle traffic spikes that would crash a standard server.

Don't let slow I/O kill your application's responsiveness. Deploy a high-performance KVM instance on CoolVDS today and see the difference a tuned stack makes.