API Gateway Performance Tuning: Squeezing Every Millisecond Out of Nginx
Let’s be honest: your API isn't slow because your code is bad. Okay, maybe your code is bad, but more often than not, your infrastructure is choking it. I've spent the last decade debugging production environments where the developers swore their Go microservices were blazing fast, yet the client was seeing 500ms latency on a simple GET request. The culprit? Usually a misconfigured gateway sitting on top of a noisy, oversold VPS.
In the Norwegian market, where users expect near-instant responses—latency to NIX (Norwegian Internet Exchange) in Oslo should be under 5ms—sluggishness is unforgivable. If you are serving traffic to Oslo, Bergen, or Trondheim, and your gateway adds 50ms of overhead, you are failing.
Today, we are going to look at how to tune an Nginx-based API gateway (whether you are using raw Nginx, OpenResty, or Kong) for maximum throughput on Linux. This isn't theoretical. This is the config stack I used to survive Black Friday 2018.
1. The "War Story": Connection Churn is the Enemy
Last year, I audited a setup for a local e-commerce client expecting high traffic. They had a Kubernetes cluster (v1.13 at the time) handling the logic, and an Nginx ingress controller acting as the gateway. They were getting 502 Bad Gateway errors at only 2,000 requests per second.
The issue wasn't CPU. It was TCP ephemeral port exhaustion. They were opening a new connection to the upstream backend for every single request. The Linux kernel simply couldn't recycle TIME_WAIT sockets fast enough.
The Fix: Upstream Keepalives
By default, Nginx talks to upstreams using HTTP/1.0 and closes the connection. You need to explicitly enable HTTP/1.1 and keepalives for backend traffic.
upstream backend_api {
server 10.0.0.5:8080;
server 10.0.0.6:8080;
# The magic number. Keep this many idle connections open.
keepalive 64;
}
server {
location /api/ {
proxy_pass http://backend_api;
# Required for keepalive to work
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}
After applying this, the load dropped by 40%, and latency smoothed out immediately. If you run on CoolVDS, our internal network throughput handles persistent connections effortlessly, but you still need to tell Nginx to use them.
2. Kernel Tuning: Don't Let Linux Throttle You
Stock Ubuntu 18.04 or CentOS 7 settings are designed for general-purpose computing, not high-performance packet shuffling. You need to modify /etc/sysctl.conf to allow for a higher volume of open files and faster TCP recycling.
Here is the baseline configuration I deploy on every gateway node:
# /etc/sysctl.conf
# Increase system-wide file descriptors
fs.file-max = 2097152
# Increase the backlog of incoming connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
# Reuse sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1
# Increase port range for outgoing connections
net.ipv4.ip_local_port_range = 1024 65535
# BBR Congestion Control (Available in Kernel 4.9+)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr
Pro Tip: Always check if BBR is actually enabled by running sysctl net.ipv4.tcp_congestion_control. In 2019, BBR is the single best upgrade you can make for clients connecting over mobile networks (4G) which are common in rural Norway. It handles packet loss far better than Cubic.
3. TLS 1.3 is Mandatory Now
OpenSSL 1.1.1 was released last year, bringing full support for TLS 1.3. If you haven't upgraded your SSL termination yet, you are wasting round-trips. TLS 1.3 reduces the handshake from two round-trips to one (or zero with 0-RTT, though be careful with replay attacks there).
Update your Nginx SSL block to prioritize the new protocol:
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers TLS13-CHACHA20-POLY1305-SHA256:TLS13-AES-256-GCM-SHA384:TLS13-AES-128-GCM-SHA256:EECDH+AESGCM:EDH+AESGCM;;
ssl_prefer_server_ciphers on;
4. The Hardware Reality: NVMe vs. "SSD"
Software tuning only gets you so far. I’ve seen developers spend weeks optimizing Lua scripts in Kong, only to realize their database read latency fluctuates wildly because their hosting provider put them on a crowded node with standard SATA SSDs.
In a virtualized environment, I/O Wait is the silent killer. When your neighbor on the server decides to run a massive backup or compile a kernel, your API gateway stutters. You can see this in top:
%Cpu(s): 15.2 us, 4.3 sy, 0.0 ni, 45.1 id, 35.2 wa, 0.0 hi, 0.2 si, 0.0 st
See that 35.2 wa? That’s 35% of the time your CPU is sitting idle, waiting for the disk. In a high-concurrency API gateway, this causes request queues to pile up, leading to timeouts.
This is why we built CoolVDS on KVM with pure NVMe storage. NVMe queues are massive (64k queues with 64k commands each) compared to AHCI (1 queue with 32 commands). We don't use the term "high performance" lightly; we use it because the hardware dictates it. KVM ensures your RAM is yours, and NVMe ensures your disk I/O isn't fighting for air.
5. Worker Processes and CPU Affinity
Finally, ensure Nginx knows how to use your CPU cores. The auto setting is usually fine, but binding workers to specific cores can reduce context switching on very high-load systems.
worker_processes auto;
worker_cpu_affinity auto;
events {
worker_connections 65535;
use epoll;
multi_accept on;
}
Comparison: Standard VPS vs. Optimized CoolVDS Instance
| Metric | Standard VPS (SATA SSD) | CoolVDS (NVMe + KVM) |
|---|---|---|
| Disk Read IOPS | ~5,000 | ~20,000+ |
| I/O Wait (High Load) | 10-40% | < 1% |
| Time to First Byte (TTFB) | 45ms - 120ms | 15ms - 25ms |
| Noisy Neighbor Impact | High | Minimal (KVM Isolation) |
Compliance Note: GDPR and Logs
Since we are operating in Norway (or dealing with EU citizens), remember that your API logs likely contain PII (IP addresses). Under GDPR, you cannot hoard this data indefinitely. Configure Nginx to rotate logs daily and retain them only for as long as strictly necessary (e.g., 14 days for security auditing).
Using a Norwegian provider like CoolVDS also simplifies your Schrems II compliance (even though the ruling is still being debated, data sovereignty is key). Keeping data within the EEA is the safest bet for any CTO in 2019.
Conclusion
Performance isn't an accident. It is the result of deliberate configuration and superior hardware. By tuning your kernel, enabling upstream keepalives, and adopting TLS 1.3, you can handle significantly more traffic with fewer resources.
However, you cannot tune your way out of bad hardware. If your current host is stealing your CPU cycles or choking your I/O, it’s time to move.
Stop letting latency kill your user experience. Deploy a high-performance, NVMe-backed KVM instance on CoolVDS today and see the difference in `wrk` benchmarks immediately.