Console Login

API Gateway Tuning: Squeezing Microseconds out of NGINX and Kong in 2021

Stop Accepting Default Latency

Most engineers treat their API Gateway as a black box. They deploy a standard NGINX container or a default Kong helm chart, slap a load balancer in front of it, and wonder why their TTFB (Time To First Byte) sits at a sluggish 50ms before the request even hits the backend application. In high-frequency trading or real-time bidding, 50ms is an eternity. Even for a standard e-commerce site, it is the difference between conversion and abandonment.

I recently audited a setup for a logistics firm based in Oslo. They were running a Kubernetes cluster on a major US cloud provider's European region. Their API gateway was choking under 10k requests per second (RPS). The solution wasn't adding more nodes; it was fixing the atrocious default configurations that Linux and NGINX ship with. After tuning the kernel and migrating to a dedicated KVM slice with direct NVMe access, we dropped gateway overhead to sub-4ms.

Here is how you do it, using technologies available right now in early 2021.

1. The OS Layer: Linux Wasn't Built for This (By Default)

Out of the box, most Linux distributions (Ubuntu 20.04 LTS, CentOS 8) are tuned for general-purpose computing, not high-throughput packet forwarding. If you are running an API Gateway on CoolVDS, you have root access. Use it. The first bottleneck you will hit is the file descriptor limit. Everything in Linux is a file, including a TCP connection.

Check your current limits:

ulimit -n

If it returns 1024, your gateway will cap out instantly under load. You need to increase this permanently in /etc/security/limits.conf:

* soft nofile 65535
* hard nofile 65535
root soft nofile 65535
root hard nofile 65535

Kernel TCP Stack Tuning

Next, we tackle the TCP stack. The defaults for connection tracking and backlog queues are too conservative for 2021 traffic levels. We need to allow the kernel to recycle TIME_WAIT sockets faster and handle bursts of new connections without dropping SYN packets.

Edit your /etc/sysctl.conf. This configuration is aggressive but necessary for high-performance gateways:

# Maximize the backlog of incoming connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535

# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535

# Allow reusing sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Protect against SYN flood while handling legitimate bursts
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 4096

# Fast Open can reduce latency by one RTT (requires client support)
net.ipv4.tcp_fastopen = 3

# Congestion control - BBR is standard in recent kernels (4.9+)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

Apply these changes with sysctl -p. Note the use of BBR (Bottleneck Bandwidth and Round-trip propagation time). Google's BBR algorithm significantly improves throughput on networks with packet loss, which is inevitable over the public internet.

2. NGINX Configuration: The Engine Room

Whether you use vanilla NGINX or OpenResty (the heart of Kong), the directives remain similar. The most common mistake I see is neglecting upstream keepalives. By default, NGINX opens a new connection to your backend service for every single request. This involves a full TCP handshake (SYN, SYN-ACK, ACK) and SSL handshake if you encrypt internal traffic.

This adds massive overhead. You must configure NGINX to keep connections open.

upstream backend_service {
    server 10.0.0.5:8080;
    # Keep 64 idle connections open to this upstream
    keepalive 64;
}

server {
    location /api/ {
        proxy_pass http://backend_service;
        
        # Required for keepalive to work
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        
        # Buffer settings to handle JSON payloads without disk I/O
        proxy_buffers 16 16k;
        proxy_buffer_size 32k;
    }
}
Pro Tip: Disable access logging for high-traffic assets (like health checks or static images) or buffer the logs. Writing to disk for every 200 OK response is an IOPS killer. Use access_log off; or access_log /var/log/nginx/access.log combined buffer=32k flush=1m;.

3. The Hardware Reality: Why "Cloud" Often Fails

You can tune software all day, but you cannot tune away a noisy neighbor. In multi-tenant cloud environments,