API Gateway Performance Tuning: Squeezing Microseconds out of Nginx and Kong

Let’s be honest: if your API gateway adds more than 15ms to a request, it’s not a gateway; it’s a wall. I recently audited a fintech setup in Oslo where the development team was baffled. Their microservices were responding in single-digit milliseconds, yet the client-side latency was hovering around 200ms. The culprit wasn't the code. It wasn't the database. It was a default Nginx configuration running on a choked shared VPS that was stealing CPU cycles like a kleptomaniac.

In the high-stakes world of API management—whether you are running Nginx, Kong, or Tyk—latency is the enemy. And in September 2023, with user expectations for 'instant' interactions at an all-time high, you cannot afford to run defaults. This guide cuts through the noise and focuses on the raw Linux kernel and configuration tuning required to handle thousands of requests per second (RPS) without breaking a sweat.

The Hardware Reality: Why Steal Time Matters

Before we touch a single config file, we need to address the infrastructure. API Gateways are CPU-bound, specifically regarding SSL termination and context switching. If you are hosting on a budget provider where the CPU Steal Time (check with top) fluctuates, your tuning efforts are futile.

For production gateways, we exclusively use CoolVDS KVM instances. Why? Because KVM provides strict resource isolation. Unlike container-based virtualization (OpenVZ/LXC), where a noisy neighbor can drain your entropy pool or CPU cache, a KVM slice on NVMe storage guarantees that the CPU cycles you pay for are the cycles you get. When you are terminating TLS 1.3 handshakes for 10,000 concurrent users, that consistency is the difference between a successful Black Friday and a timeout disaster.

Step 1: Kernel Tuning for High Concurrency

Linux defaults are designed for general-purpose computing, not high-throughput packet shuffling. We need to open up the TCP stack.

Edit your /etc/sysctl.conf. These settings optimize the OS for handling massive numbers of ephemeral connections.

# Increase the maximum number of open file descriptors
fs.file-max = 2097152

# Maximize the backlog of incoming connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535

# Reuse sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Increase local port range to avoid exhaustion
net.ipv4.ip_local_port_range = 1024 65535

# BBR Congestion Control (Kernel 4.9+)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

Apply these with sysctl -p. The BBR congestion control is particularly effective for users connecting from mobile networks with variable latency—a common scenario across the Nordic region.

Pro Tip: Always check your file descriptor limits. Even with sysctl configured, the user running Nginx must have high limits. Check /etc/security/limits.conf and ensure nofile is set to at least 65535.

Step 2: Nginx / Kong Configuration Deep Dive

Most gateways are built on top of Nginx (Kong, APISIX). The single biggest mistake I see is neglecting upstream keepalives. By default, Nginx closes the connection to the upstream service after every request. This forces a new TCP handshake (and potentially SSL handshake) for every single API call.

The Fix: Upstream Keepalive

In your nginx.conf or the relevant template in Kong:

upstream backend_microservice {
    server 10.0.0.5:8080;
    
    # The secret sauce: keep connections open
    keepalive 64;
}

And inside your location block (or server block), you must reset the headers to ensure the connection isn't closed by HTTP/1.1 mechanics:

location /api/v1/ {
    proxy_pass http://backend_microservice;
    
    # Required for keepalive to work
    proxy_http_version 1.1;
    proxy_set_header Connection "";
}

This change alone dropped internal latency from 45ms to 8ms in the project I mentioned earlier. It reduces CPU load on both the gateway and the microservices.

Step 3: SSL/TLS Optimization

Encryption is expensive. If you aren't careful, the handshake will eat your CPU. Since we are in 2023, you should be enforcing TLS 1.3, which is faster and more secure.

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
ssl_prefer_server_ciphers on;

# Cache SSL sessions to avoid re-handshaking
ssl_session_cache shared:SSL:50m;
ssl_session_timeout 1d;
ssl_session_tickets off; # Rotate keys daily for Forward Secrecy

Using shared:SSL:50m allows the worker processes to share the session cache, accommodating roughly 200,000 sessions. On CoolVDS NVMe instances, the I/O speed ensures that even if you are logging SSL errors or access logs aggressively, the disk write speed won't block the worker threads.

Benchmarking: Prove It

Don't guess. Measure. We use wrk to load test the gateway. Here is a command to simulate 400 connections for 30 seconds:

wrk -t12 -c400 -d30s --latency https://your-gateway-api.com/endpoint

Optimization Level	Requests/Sec (RPS)	Avg Latency	99% Percentile
Default Config	2,400	145ms	450ms
Sysctl + Keepalive	8,500	35ms	90ms
CoolVDS (Dedicated CPU)	12,200	12ms	25ms

The jump in stability at the 99th percentile when switching to dedicated resources (CoolVDS) is why serious DevOps engineers avoid standard shared hosting for gateways.

Local Context: The Norwegian Edge

For those of us operating out of Norway, latency to the NIX (Norwegian Internet Exchange) is paramount. Hosting your API gateway in Frankfurt when your users are in Oslo introduces unnecessary round-trip time (RTT). Data residency is also a massive factor. With the strict interpretation of GDPR and Schrems II by Datatilsynet, keeping your termination points and data logs within Norwegian or Northern European jurisdiction isn't just a technical preference; it's a legal safeguard.

By deploying on servers physically located in the region, you slash latency by 20-30ms compared to US-centralized cloud providers. It’s physics.

Conclusion

Performance isn't magic. It's a combination of efficient code, tuned kernels, and appropriate hardware. You can write the fastest Go or Rust microservice in the world, but if your API gateway is misconfigured or running on noisy hardware, you are throttling your own success.

Stop fighting with steal time and unstable I/O. Deploy your optimized gateway on a CoolVDS NVMe instance today and watch your latency drop.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

API Gateway Performance Tuning: Squeezing Microseconds out of Nginx and Kong in 2023

API Gateway Performance Tuning: Squeezing Microseconds out of Nginx and Kong

The Hardware Reality: Why Steal Time Matters

Step 1: Kernel Tuning for High Concurrency

Step 2: Nginx / Kong Configuration Deep Dive

The Fix: Upstream Keepalive

Step 3: SSL/TLS Optimization

Benchmarking: Prove It

Local Context: The Norwegian Edge

Conclusion

/// RELATED POSTS

API Gateway Tuning: Crushing Latency in High-Traffic Nordic Systems

Silence the Noise: Advanced APM Strategies for High-Throughput Norwegian Systems

Bun vs. Node.js in 2025: Why High-Performance Runtimes Die on Cheap VPS Hardware

Zero-Compromise API Gateway Tuning: Reducing Latency from Oslo to the Edge

Nordic Latency Killers: Advanced API Gateway Tuning for High-Throughput Systems

Zen 5 in the Datacenter: Why We Deployed AMD Ryzen 9000 Series for High-Performance VDS