Console Login

API Gateway Performance Tuning: Squeezing Microseconds out of Nginx and Kong (2019 Edition)

API Gateway Performance Tuning: Squeezing Microseconds out of Nginx and Kong

If your API Gateway adds more than 15 milliseconds to a request, you aren't managing traffic—you're obstructing it. In the era of microservices, the gateway is the single most critical choke point. I have seen "scalable" architectures crumble under load not because the application code was bad, but because the gateway was choking on file descriptors and context switches.

Most default configurations on Ubuntu 18.04 or CentOS 7 are designed for general-purpose computing, not for handling 50,000 concurrent connections. If you blindly apt-get install Nginx and walk away, you are leaving 40% of your performance on the table. This guide is for the obsessive architects who look at `htop` and get angry when they see System CPU usage spiking due to inefficient interrupt handling.

The Hardware Reality: Why "Cloud" Often Fails

Before touching a single config file, we must address the infrastructure. You cannot tune your way out of noisy neighbor syndrome. In a standard public cloud environment, "vCPU" is often a marketing term for "a timeslice we might give you if no one else needs it."

For an API Gateway, inconsistent I/O latency is death. When Nginx writes access logs or buffers a payload to disk, it needs immediate NVMe access. If your hypervisor queues that I/O behind a neighbor's database backup, your p99 latency spikes from 20ms to 500ms. This is why at CoolVDS, we enforce strict KVM isolation and utilize local NVMe storage. We don't oversell the underlying metal because we know that `iowait` is the enemy of throughput.

Step 1: The Linux Kernel is Your First Bottleneck

The Linux networking stack defaults are conservative. For a high-throughput gateway, we need to widen the TCP pipes. We are effectively telling the kernel: "Trust me, I can handle the flood."

First, check your current congestion control algorithm. In 2019, if you aren't using BBR (Bottleneck Bandwidth and RTT), you are living in the past.

sysctl net.ipv4.tcp_congestion_control

If it returns `cubic` or `reno`, you need to upgrade. BBR handles packet loss and latency much better, which is crucial for mobile clients connecting to your Norwegian servers over fluctuating 4G networks.

Here is the battle-tested `sysctl.conf` configuration I deploy on every CoolVDS instance intended for gateway duties. This optimizes the TCP stack for massive concurrency:

# /etc/sysctl.conf - Optimized for High Concurrency API Gateways

# Maximize the backlog of incoming connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 5000

# Reuse sockets in TIME_WAIT state for new connections (safe for outgoing connections)
net.ipv4.tcp_tw_reuse = 1

# Increase local port range to avoid exhaustion during heavy proxying
net.ipv4.ip_local_port_range = 1024 65535

# Increase TCP buffer sizes for 10Gbps+ networks
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# Protect against SYN flood attacks (basic protection)
net.ipv4.tcp_syncookies = 1

# Enable TCP Fast Open (TFO) to reduce handshake latency
net.ipv4.tcp_fastopen = 3

Apply these with `sysctl -p`. These settings prevent the dreaded "Connection reset by peer" errors during load spikes.

Step 2: File Descriptors (The Silent Limit)

Every connection to your gateway is a file. The default Linux limit is often 1024. This is laughable. You will hit this limit during a modest marketing campaign.

Check your current soft limit:

ulimit -n

If it says 1024, fix it immediately in `/etc/security/limits.conf`. You want at least 100,000 for a busy gateway.

* soft nofile 100000
* hard nofile 100000
root soft nofile 100000
root hard nofile 100000

Step 3: Nginx / OpenResty Optimization

Whether you are using raw Nginx or an OpenResty-based solution like Kong (which is excellent, by the way, especially version 1.0+), the underlying engine is Nginx. The most common mistake I see is neglecting upstream keepalives.

By default, Nginx acts as a reverse proxy that opens a new connection to your backend microservice for every single request. This involves a full TCP handshake (SYN, SYN-ACK, ACK) and SSL handshake if you are internal-encrypted. This CPU burn is unnecessary.

Pro Tip: In Norway, strict GDPR interpretation by Datatilsynet often mandates encryption even within the private network (VLAN). This makes TLS handshakes expensive. Reusing connections is not just a performance tweak; it is a necessity to keep CPU usage sane.

Here is a properly tuned Nginx block for an API Gateway scenario:

worker_processes auto;
worker_rlimit_nofile 100000; # Must match system limits

events {
    worker_connections 4096;
    use epoll;
    multi_accept on;
}

http {
    # ... logs and other settings ...

    # Disable access logging for static assets or health checks to save I/O
    map $request_uri $loggable {
        ~/health 0;
        default 1;
    }

    # Upstream configuration with Keepalive
    upstream backend_microservices {
        server 10.0.0.5:8080;
        server 10.0.0.6:8080;
        
        # VITAL: Keep idle connections open to the backend
        keepalive 64;
    }

    server {
        listen 443 ssl http2;
        server_name api.coolvds-client.no;

        # Buffer optimizations
        client_body_buffer_size 128k;
        client_max_body_size 10m;
        
        # SSL Optimization (TLS 1.3 is essential in 2019)
        ssl_protocols TLSv1.2 TLSv1.3;
        ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
        ssl_prefer_server_ciphers on;

        location / {
            proxy_pass http://backend_microservices;
            
            # required for keepalive to work
            proxy_http_version 1.1;
            proxy_set_header Connection "";
            
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        }
    }
}

Pay close attention to `proxy_set_header Connection "";`. Without this, Nginx forwards the "Close" header to the backend, killing the keepalive connection you just tried to establish.

Step 4: Benchmarking the Result

Do not assume it worked. Measure it. We use `wrk`, a modern HTTP benchmarking tool that can generate significant load from a single node.

Run this from a separate CoolVDS instance (inside the same LAN for throughput testing, or from an external node to test public latency):

wrk -t12 -c400 -d30s https://api.yourdomain.no/v1/status

Look at the Latency Distribution in the output. Your 99% percentile should be stable. If you see high variance, you likely have "CPU Steal" on your host.

The Geographic Advantage

Physics is the ultimate bottleneck. Light travels at a fixed speed. If your users are in Oslo and your API gateway is in a generic datacenter in Frankfurt or Amsterdam, you are accepting a ~20-30ms round-trip penalty before the request even hits your server.

For Norwegian businesses, hosting locally is not just about patriotism; it's about latency. Connecting via NIX (Norwegian Internet Exchange) ensures that traffic between your users (on Telenor, Telia, or Altibox) and your server stays within the country. This keeps latency typically below 5ms.

Comparison: Latency to Oslo

Hosting Location Avg Latency to Oslo Data Sovereignty
CoolVDS (Oslo) 2-5 ms High (In-country)
AWS (Stockholm) 10-15 ms Medium
DigitalOcean (Frankfurt) 25-35 ms Low
US East (Virginia) 90-110 ms None

Final Thoughts

Performance tuning is an iterative process. You tune the kernel, then the application hits a limit. You tune the application, then the database chokes. However, building on a solid foundation is non-negotiable. Using a dedicated KVM slice with NVMe storage solves the hardware variable, allowing you to focus purely on software optimization.

Don't let your infrastructure be the reason your API feels sluggish. Spin up a CoolVDS instance today, apply these `sysctl` settings, and watch your latency drop.