Squeezing Every Millisecond: High-Performance API Proxy Tuning with Nginx & KVM
It is November 2014, and if your API response time exceeds 200ms, you are essentially invisible to mobile users on 3G networks. We aren't building static corporate brochures anymore; we are building dynamic, heavy backends serving JSON to iOS and Android apps. I recently audited a client's setup in Oslo who was baffled why their Magento REST API was timing out during traffic spikes, despite having 32GB of RAM. The culprit wasn't the codeβit was the default Linux TCP stack and a poorly configured reverse proxy.
Default configurations are designed for compatibility, not performance. To handle thousands of concurrent connections (C10k), you need to get your hands dirty with file descriptors, worker processes, and kernel flags. Here is how we tune the edge layer for raw speed, keeping compliance with Datatilsynet requirements by keeping data strictly on Norwegian soil.
1. The Foundation: Worker Processes and File Descriptors
Nginx is event-driven, which makes it superior to Apache's prefork model for an API gateway. However, out of the box, it is often throttled. The first limit you will hit is the number of open file descriptors. Linux defaults this to 1024. For a high-load API, this is laughable.
First, check your limits:
ulimit -n
If it returns 1024, edit /etc/security/limits.conf to raise the ceiling for the nginx user:
nginx soft nofile 65535
nginx hard nofile 65535
Next, configure nginx.conf to utilize these descriptors and bind workers to CPU cores to prevent context switching. In a virtualized environment like KVM, CPU affinity is critical.
user nginx;
worker_processes auto;
worker_rlimit_nofile 65535;
events {
worker_connections 8096;
multi_accept on;
use epoll;
}
2. Upstream Keepalive: The Hidden Killer
Most sysadmins configure Keepalive for the client-facing side but forget the upstream (backend) side. Without upstream keepalive, Nginx opens a new TCP connection to your backend (PHP-FPM, Node.js, Python) for every single API request. This adds the full TCP 3-way handshake latency to every call.
For an API gateway, you must reuse connections to the backend application. Here is the correct configuration pattern:
http {
upstream backend_api {
server 127.0.0.1:8080;
# Keeps 32 idle connections open per worker
keepalive 32;
}
server {
location /api/ {
proxy_pass http://backend_api;
# HTTP 1.1 is required for keepalive
proxy_http_version 1.1;
# Clear the connection header to allow persistence
proxy_set_header Connection "";
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
}
Pro Tip: If you are proxying SSL requests, the handshake overhead is even higher. Always enable OCSP Stapling and use a shared SSL session cache to reduce CPU load on the handshake.
3. Kernel Tuning: The Sysctl Layer
Linux defaults are conservative. When you have high churn on short-lived API connections, you run out of ephemeral ports, leaving sockets in the TIME_WAIT state. You need to tell the kernel it's okay to recycle these sockets faster.
Add the following to /etc/sysctl.conf and run sysctl -p:
# Reduce the time a connection stays in TIME_WAIT
net.ipv4.tcp_fin_timeout = 30
# Allow reusing sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1
# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65000
# Maximize the backlog for incoming packets
net.core.somaxconn = 4096
net.ipv4.tcp_max_syn_backlog = 4096
Warning: Do not enable tcp_tw_recycle if you are behind a NAT, as it causes connection drops. tcp_tw_reuse is the safer bet for 2014 era kernels.
4. Storage I/O: The Hard Limit
You can tune software all day, but if your disk I/O latency is high, your database queries will pile up, and your API gateway will eventually timeout waiting for the backend. In traditional spinning rust (HDD) setups, IOPS (Input/Output Operations Per Second) is the bottleneck.
This is where the underlying virtualization technology matters. We see many hosting providers overselling resources using OpenVZ containers, where