API Gateway Performance Tuning: Squeezing Milliseconds Out of Nginx on Linux
Your API isn't slow. Your infrastructure is lying to you. I recently audited a fintech setup in Oslo where the development team spent three weeks refactoring Go microservices to shave off 10ms of processing time. It didn't matter. Their ingress controller was adding 200ms of latency during peak loads because they were hitting connection tracking limits and suffering from massive CPU steal time on a generic public cloud provider.
If you are running high-throughput workloadsâwhether it's Kong, Nginx, or HAProxyâdefault Linux distributions are configured for compatibility, not performance. They are tuned for a file server from 2010, not an API gateway handling 50k requests per second in 2022. Here is how we fix the stack, from the kernel up, while keeping Datatilsynet happy.
1. The Kernel: Fixing the TCP Stack
Before touching your gateway configuration, you must address the OS. The default ephemeral port range and backlog settings will drop packets silently when traffic spikes. I see this constantly: `dmesg` full of "possible SYN flooding on port 443". That is usually not a DDoS; it is your legit users hitting a wall.
Edit your /etc/sysctl.conf. These settings assume you have at least 4GB RAM and a modern kernel (4.19+).
# Maximize the backlog to prevent packet drops during bursts
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
# Widen the ephemeral port range to allow more outgoing connections (critical for proxying)
net.ipv4.ip_local_port_range = 1024 65535
# Reduce TIME_WAIT state to recycle sockets faster
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1
# Increase buffer sizes for high-bandwidth links (10Gbps+)
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# Enable BBR Congestion Control (check kernel support first)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr
Apply this with sysctl -p. The tcp_tw_reuse flag is safer than the deprecated tcp_tw_recycle (which broke NAT) and is essential for gateways proxying requests to backend services.
2. The Gateway: Nginx Configuration for Concurrency
Most default Nginx configs cap worker_connections at 768 or 1024. If you are handling 5,000 concurrent users, Nginx will literally stop accepting new connections. Furthermore, SSL termination is CPU expensive. If you aren't caching SSL sessions, you are burning cycles unnecessarily.
Here is a production-hardened snippet for nginx.conf used in high-traffic deployments:
worker_processes auto;
worker_rlimit_nofile 65535;
events {
worker_connections 16384;
use epoll;
multi_accept on;
}
http {
# ... logs and mime types ...
# OPTIMIZATION: Keepalive connections to upstream reduce handshake overhead
upstream backend_api {
server 10.0.0.5:8080;
keepalive 64;
}
# OPTIMIZATION: Buffer tuning to prevent disk I/O for small payloads
client_body_buffer_size 128k;
client_max_body_size 10m;
client_header_buffer_size 1k;
large_client_header_buffers 4 8k;
output_buffers 1 32k;
postpone_output 1460;
# OPTIMIZATION: SSL Session Caching
ssl_session_cache shared:SSL:10m; # Holds approx 40k sessions
ssl_session_timeout 10m;
ssl_buffer_size 4k; # Lower buffer size reduces Time To First Byte (TTFB)
}
Pro Tip: Monitor your "open file descriptors". Even if Nginx is tuned, if the system-wideulimit -nis 1024, Nginx will crash under load. Setulimit -n 65535in your systemd service file or/etc/security/limits.conf.
3. The Hardware Reality: Why Virtualization Matters
You can have the most optimized Nginx config in the world, but if your underlying host is overcommitting resources, you are fighting a losing battle. This is the "Steal Time" metric in top (marked as %st). If this is above 0.0%, your neighbor is stealing your CPU cycles.
In a containerized world, we often forget that I/O Wait is a killer. API Gateways log heavily. If you are on a standard spinning disk or a network-throttled SSD (common in budget VPS), writing logs blocks the worker process. The request hangs until the disk acknowledges the write.
| Resource | Budget Cloud VPS | CoolVDS Architecture | Impact on API |
|---|---|---|---|
| CPU | Shared/Burstable (High Steal Time) | Dedicated/KVM (No Steal Time) | Consistent latency vs. random spikes |
| Storage | Network Storage (SATA/SSD mix) | Local NVMe | Log writing doesn't block requests |
| Network | Shared 1Gbps Uplink | Dedicated Uplink per Node | No packet loss during neighbor's DDoS |
This is why we built CoolVDS on KVM with local NVMe storage. For an API Gateway, disk latency correlates directly to response latency. When we migrated a customer from a generic cloud provider to our NVMe-backed instances in Oslo, their p99 latency dropped from 340ms to 45ms without changing a single line of code.
4. Compliance: The Norwegian Context
Since the Schrems II ruling, sending personal data (PII) across the Atlantic has become a legal minefield. If your API gateway logs IP addresses or user IDs and pushes them to a US-owned cloud region, you are increasing your risk profile. Hosting locally in Norway isn't just about physics (latency); it's about sovereignty.
Using a VPS in Oslo ensures that the round-trip time (RTT) for your local users is under 10ms. A request from Oslo to Frankfurt usually takes 25-30ms. That adds up when your API does multiple round trips. CoolVDS infrastructure is physically located here, ensuring both GDPR compliance and the lowest possible RTT for the Nordic market.
Final Verification
After tuning, run a benchmark. Do not use ab (Apache Bench) as it is single-threaded. Use wrk to simulate realistic load:
# Simulating 12 threads, 400 connections, for 30 seconds
wrk -t12 -c400 -d30s http://your-coolvds-ip/api/health
If you aren't seeing the throughput you expect, check the hardware. Don't let slow I/O kill your performance. Deploy a test instance on CoolVDS in 55 seconds and see the difference raw NVMe power makes.