Console Login

API Gateway Performance Tuning: Squeezing Microseconds out of Nginx and Kong in 2022

Latency is the Silent Killer of Digital Revenue

If your API gateway adds more than 20ms of overhead, you are doing it wrong. In the high-stakes world of microservices, the gateway is the bouncer, the traffic cop, and the translator. If the bouncer is slow, the club remains empty.

I recently audited a fintech setup in Oslo. They were running a standard Nginx reverse proxy on a generic cloud provider. Their complaint? Random 502 errors during peak trading hours. The root cause wasn't the application code; it was a default Linux kernel configuration collision with a noisy neighbor stealing CPU cycles. We fixed it, but it required surgery on the OS and a migration to dedicated resources.

Here is how you tune your API Gateway for raw performance, specifically focusing on the stack available to us in mid-2022.

1. The Foundation: Kernel Tuning

Most Linux distributions, including the latest Ubuntu 20.04 LTS or Debian 11, ship with conservative networking defaults intended for desktop usage or light serving. For an API gateway handling thousands of concurrent connections, these defaults are suffocating.

You need to widen the TCP pipe. Open your /etc/sysctl.conf and apply these changes. This isn't theoretical; this is what keeps packets flowing when the load hits.

# /etc/sysctl.conf

# Maximize the backlog of incoming connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535

# Reuse sockets in TIME_WAIT state for new connections
# Critical for high-throughput API gateways
net.ipv4.tcp_tw_reuse = 1

# Increase ephemeral port range
net.ipv4.ip_local_port_range = 1024 65535

# Increase TCP buffer sizes for modern high-speed networks
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# Enable TCP Fast Open (RFC 7413) to save round trips
net.ipv4.tcp_fastopen = 3

Run sysctl -p to apply. Without tcp_tw_reuse, your gateway will exhaust available sockets during traffic spikes, leading to those dreaded connection timeouts.

2. Nginx & Kong: The Upstream Keepalive Trap

Whether you are using raw Nginx (1.21.x) or Kong Gateway (2.x), the most common performance mistake I see is the failure to enable upstream keepalives. By default, Nginx opens a new connection to your backend service for every single request. This involves a full TCP handshake (and potentially a TLS handshake) for every API call.

This adds massive latency and CPU load. Here is the correct configuration to maintain persistent connections to your microservices.

The Incorrect (Default) Way

upstream backend_api {
    server 10.0.0.5:8080;
}

server {
    location /api/ {
        proxy_pass http://backend_api;
    }
}

The Optimized Way

You must define the `keepalive` directive in the upstream block and force HTTP/1.1 in the proxy headers.

upstream backend_api {
    server 10.0.0.5:8080;
    
    # Keep 64 idle connections open to the backend
    keepalive 64;
}

server {
    location /api/ {
        proxy_pass http://backend_api;
        
        # Required for keepalive to work
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        
        # Buffer tuning for JSON payloads
        proxy_buffers 16 16k;
        proxy_buffer_size 32k;
    }
}

This change alone dropped internal latency from 35ms to 4ms in the Oslo project I mentioned earlier.

3. SSL/TLS Offloading: The Cost of Encryption

In 2022, there is no excuse for not using TLS 1.3. It reduces the handshake overhead significantly. However, encryption costs CPU cycles. If you are terminating SSL for 10,000 requests per second, generic virtual CPUs (vCPUs) often choke.

Pro Tip: Verify your OpenSSL version. You should be running OpenSSL 1.1.1 or newer to take advantage of the latest instruction sets. Run openssl version to check.

Ensure your cipher suites are prioritized for speed without sacrificing security. Here is a battle-tested SSL config for Nginx:

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384;
ssl_prefer_server_ciphers on;

# OCSP Stapling - Save your users a DNS lookup
ssl_stapling on;
ssl_stapling_verify on;
resolver 1.1.1.1 8.8.8.8 valid=300s;
resolver_timeout 5s;

4. The Hardware Factor: Why CoolVDS Wins on I/O

You can tune your kernel and Nginx config all day, but software cannot fix hardware bottlenecks. API Gateways are surprisingly I/O heavy. They write extensive access logs, error logs, and often buffer large request bodies to disk if they exceed memory limits.

On budget hosting providers, you are often sitting on shared SATA SSDs or, worse, network-attached storage (CEPH) with noisy neighbors. When a neighbor spins up a heavy database query, your disk I/O wait (iowait) spikes, and your API latency fluctuates.

This is why we architected CoolVDS differently.

  • Local NVMe Storage: We don't use slow network storage for root volumes. Your logs hit local NVMe drives, ensuring write speeds that can handle debug logging even under load.
  • KVM Virtualization: Unlike containers (LXC/OpenVZ) where the kernel is shared, KVM provides true isolation. Your `sysctl` tuning actually works because you own the kernel parameters.
  • Data Sovereignty: Hosting in Norway implies strict adherence to GDPR. Keeping traffic local (e.g., peering at NIX in Oslo) not only satisfies the Datatilsynet requirements but also adheres to the laws of physics. Light travels fast, but distance matters.
Estimated Round Trip Time (RTT) to Oslo Users
Hosting Location Approx. Latency Impact
CoolVDS (Oslo) 2-5 ms Instant
Frankfurt (AWS/Google) 25-35 ms Noticeable
US East (N. Virginia) 90-110 ms Sluggish

5. Benchmarking: Don't Guess, Measure

Before you deploy, stress test. ab (Apache Bench) is outdated. Use wrk for a modern, multi-threaded load test that handles keep-alives correctly.

Installation on Ubuntu/Debian:

sudo apt-get install wrk

Run a test simulating 400 concurrent connections for 30 seconds:

wrk -t12 -c400 -d30s http://your-coolvds-ip/api/health

Look at the Latency Distribution. If your 99th percentile is over 100ms, go back to step 1.

Conclusion

Performance isn't magic; it's engineering. By aligning the Linux kernel, Nginx upstream configurations, and utilizing hardware that doesn't steal your cycles, you can achieve sub-millisecond overhead on your gateway.

Don't let legacy hardware govern your application's speed. Deploy a high-performance NVMe KVM instance on CoolVDS today and see what your API is actually capable of.