API Gateway Performance Tuning: Squeezing Microseconds out of Nginx and Kong
Let’s be honest: if your API gateway adds more than 15ms to a request, it’s not a gateway; it’s a wall. I recently audited a fintech setup in Oslo where the development team was baffled. Their microservices were responding in single-digit milliseconds, yet the client-side latency was hovering around 200ms. The culprit wasn't the code. It wasn't the database. It was a default Nginx configuration running on a choked shared VPS that was stealing CPU cycles like a kleptomaniac.
In the high-stakes world of API management—whether you are running Nginx, Kong, or Tyk—latency is the enemy. And in September 2023, with user expectations for 'instant' interactions at an all-time high, you cannot afford to run defaults. This guide cuts through the noise and focuses on the raw Linux kernel and configuration tuning required to handle thousands of requests per second (RPS) without breaking a sweat.
The Hardware Reality: Why Steal Time Matters
Before we touch a single config file, we need to address the infrastructure. API Gateways are CPU-bound, specifically regarding SSL termination and context switching. If you are hosting on a budget provider where the CPU Steal Time (check with top) fluctuates, your tuning efforts are futile.
For production gateways, we exclusively use CoolVDS KVM instances. Why? Because KVM provides strict resource isolation. Unlike container-based virtualization (OpenVZ/LXC), where a noisy neighbor can drain your entropy pool or CPU cache, a KVM slice on NVMe storage guarantees that the CPU cycles you pay for are the cycles you get. When you are terminating TLS 1.3 handshakes for 10,000 concurrent users, that consistency is the difference between a successful Black Friday and a timeout disaster.
Step 1: Kernel Tuning for High Concurrency
Linux defaults are designed for general-purpose computing, not high-throughput packet shuffling. We need to open up the TCP stack.
Edit your /etc/sysctl.conf. These settings optimize the OS for handling massive numbers of ephemeral connections.
# Increase the maximum number of open file descriptors
fs.file-max = 2097152
# Maximize the backlog of incoming connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
# Reuse sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1
# Increase local port range to avoid exhaustion
net.ipv4.ip_local_port_range = 1024 65535
# BBR Congestion Control (Kernel 4.9+)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbrApply these with sysctl -p. The BBR congestion control is particularly effective for users connecting from mobile networks with variable latency—a common scenario across the Nordic region.
Pro Tip: Always check your file descriptor limits. Even with sysctl configured, the user running Nginx must have high limits. Check/etc/security/limits.confand ensurenofileis set to at least 65535.
Step 2: Nginx / Kong Configuration Deep Dive
Most gateways are built on top of Nginx (Kong, APISIX). The single biggest mistake I see is neglecting upstream keepalives. By default, Nginx closes the connection to the upstream service after every request. This forces a new TCP handshake (and potentially SSL handshake) for every single API call.
The Fix: Upstream Keepalive
In your nginx.conf or the relevant template in Kong:
upstream backend_microservice {
server 10.0.0.5:8080;
# The secret sauce: keep connections open
keepalive 64;
}And inside your location block (or server block), you must reset the headers to ensure the connection isn't closed by HTTP/1.1 mechanics:
location /api/v1/ {
proxy_pass http://backend_microservice;
# Required for keepalive to work
proxy_http_version 1.1;
proxy_set_header Connection "";
}This change alone dropped internal latency from 45ms to 8ms in the project I mentioned earlier. It reduces CPU load on both the gateway and the microservices.
Step 3: SSL/TLS Optimization
Encryption is expensive. If you aren't careful, the handshake will eat your CPU. Since we are in 2023, you should be enforcing TLS 1.3, which is faster and more secure.
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
ssl_prefer_server_ciphers on;
# Cache SSL sessions to avoid re-handshaking
ssl_session_cache shared:SSL:50m;
ssl_session_timeout 1d;
ssl_session_tickets off; # Rotate keys daily for Forward SecrecyUsing shared:SSL:50m allows the worker processes to share the session cache, accommodating roughly 200,000 sessions. On CoolVDS NVMe instances, the I/O speed ensures that even if you are logging SSL errors or access logs aggressively, the disk write speed won't block the worker threads.
Benchmarking: Prove It
Don't guess. Measure. We use wrk to load test the gateway. Here is a command to simulate 400 connections for 30 seconds:
wrk -t12 -c400 -d30s --latency https://your-gateway-api.com/endpoint| Optimization Level | Requests/Sec (RPS) | Avg Latency | 99% Percentile |
|---|---|---|---|
| Default Config | 2,400 | 145ms | 450ms |
| Sysctl + Keepalive | 8,500 | 35ms | 90ms |
| CoolVDS (Dedicated CPU) | 12,200 | 12ms | 25ms |
The jump in stability at the 99th percentile when switching to dedicated resources (CoolVDS) is why serious DevOps engineers avoid standard shared hosting for gateways.
Local Context: The Norwegian Edge
For those of us operating out of Norway, latency to the NIX (Norwegian Internet Exchange) is paramount. Hosting your API gateway in Frankfurt when your users are in Oslo introduces unnecessary round-trip time (RTT). Data residency is also a massive factor. With the strict interpretation of GDPR and Schrems II by Datatilsynet, keeping your termination points and data logs within Norwegian or Northern European jurisdiction isn't just a technical preference; it's a legal safeguard.
By deploying on servers physically located in the region, you slash latency by 20-30ms compared to US-centralized cloud providers. It’s physics.
Conclusion
Performance isn't magic. It's a combination of efficient code, tuned kernels, and appropriate hardware. You can write the fastest Go or Rust microservice in the world, but if your API gateway is misconfigured or running on noisy hardware, you are throttling your own success.
Stop fighting with steal time and unstable I/O. Deploy your optimized gateway on a CoolVDS NVMe instance today and watch your latency drop.