API Gateway Performance Tuning: Squeezing Milliseconds out of Nginx
Let’s be honest: your API is probably slower than it needs to be. In the era of mobile-first development, latency is the silent killer of user retention. If your handshake takes 200ms before a single byte of application data is exchanged, you have already lost the battle. I have seen too many developers throw more RAM at a problem that is actually caused by poor file descriptor limits or misconfigured SSL ciphers.
We are seeing a shift this year. With the rise of microservices (yes, I know, the buzzword of 2016), the API Gateway has become the single most critical component in your infrastructure. It is the bouncer, the traffic cop, and the translator. If it chokes, your entire architecture goes dark.
Below is a field-tested guide to tuning your gateway, specifically focusing on Nginx running on Linux (Ubuntu 16.04 or CentOS 7). We aren't talking about theoretical maximums here; we are talking about configurations that keep production environments alive when traffic spikes hit.
1. The OS Layer: It Starts with the Kernel
Before you even touch nginx.conf, you need to look at the operating system. Linux, out of the box, is tuned for a general-purpose desktop, not a high-throughput network appliance handling thousands of concurrent connections. I recently debugged a platform for a retail client in Oslo where the application servers were fine, but the gateway was dropping packets because the connection tracking table was full.
You need to adjust your sysctl.conf to handle a high rate of incoming TCP connections and, crucially, a high rate of establishing outgoing connections to your backend services.
Key Kernel Directives
Edit /etc/sysctl.conf and consider these values:
# Increase system-wide file descriptor limit
fs.file-max = 2097152
# Increase the size of the receive queue.
# The default is often too small for high-traffic bursts.
net.core.netdev_max_backlog = 16384
net.core.somaxconn = 32768
# TCP memory tuning
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_syncookies = 1
# Port range. Crucial for the gateway acting as a client to upstreams.
net.ipv4.ip_local_port_range = 1024 65535
# Time wait reuse. Allow reusing sockets in TIME_WAIT state for new connections
# This is critical when you have high churn on connections.
net.ipv4.tcp_tw_reuse = 1
After saving, run sysctl -p. Without tcp_tw_reuse, an API gateway under load will exhaust its ephemeral ports rapidly, resulting in those dreaded 502 Bad Gateway errors even when your backend is healthy.
2. Nginx Configuration: The Engine Room
Nginx is the de facto standard for API gateways in 2016 for a reason. Its event-driven architecture allows it to handle massive concurrency with a low memory footprint. However, the default config is safe, not fast.
Worker Processes and Connections
The standard advice is worker_processes auto;. This is generally correct as it maps to CPU cores. However, you must increase the open file limit for the worker processes.
worker_rlimit_nofile 65535;
events {
worker_connections 16384;
use epoll;
multi_accept on;
}
multi_accept on tells Nginx to accept as many connections as possible after getting a notification for a new connection. In a high-volume API scenario, this reduces context switching.
Upstream Keepalive (The Most Common Mistake)
This is where I see 80% of setups fail. By default, Nginx speaks HTTP/1.0 to upstream servers and closes the connection after every request. This means for every single API call, your gateway is performing a full TCP handshake with your backend microservice. That is expensive.
You must configure HTTP/1.1 and connection pooling to your upstreams:
upstream backend_api {
server 10.0.0.5:8080;
server 10.0.0.6:8080;
# Keep 64 idle connections open to the backend
keepalive 64;
}
server {
location /api/ {
proxy_pass http://backend_api;
# Required for keepalive to work
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}
Clearing the Connection header is necessary because the client likely sent Connection: close or Keep-Alive, but you want Nginx to manage the persistence, not the client.
3. SSL/TLS: Security Without the Latency Tax
With Let's Encrypt leaving beta earlier this year, there is no excuse for non-SSL APIs. However, the handshake is CPU intensive. On a VPS, this is often your bottleneck. While we wait for hardware acceleration to become more common in standard instances, we rely on software tuning.
We want to reduce the Round Trip Time (RTT). Enabling OCSP Stapling and Session Resumption is mandatory.
ssl_session_cache shared:SSL:50m;
ssl_session_timeout 1d;
ssl_session_tickets off; # If you have multiple load balancers, this gets complex.
# OCSP Stapling
ssl_stapling on;
ssl_stapling_verify on;
resolver 8.8.8.8 8.8.4.4 valid=300s;
resolver_timeout 5s;
Pro Tip: Use Elliptic Curve Cryptography (ECC) certificates if possible. They are significantly faster to sign and verify than RSA for the same security level, reducing CPU load on your gateway.
4. The Hardware Reality: Why Storage IOPS Matter for Gateways
You might think an API gateway is purely CPU bound. But consider logging. If you are logging access logs to disk for audit or debugging (which you should be), a high-traffic API writes megabytes of text per second. On standard spinning rust (HDD) or even cheap SATA SSDs offered by budget providers, I/O blocking can cause Nginx workers to stall.
This is where infrastructure choice dictates performance. In our benchmarks, switching from standard SSD to NVMe storage reduced the 99th percentile latency by 15ms during peak load because the logging buffer never blocked the request processing.
| Metric | Standard SATA SSD VPS | CoolVDS NVMe KVM |
|---|---|---|
| IOPS (4k Rand Write) | ~5,000 | ~20,000+ |
| Disk Latency | 2-5ms | <0.5ms |
| Nginx 99% Latency | 145ms | 92ms |
We use KVM virtualization at CoolVDS because it provides better isolation than container-based virtualization (like OpenVZ). In a shared environment, you do not want your neighbors' database query stealing your CPU cycles when you are trying to terminate SSL connections.
5. The Norwegian Context: Latency and Legality
With the recent upheaval regarding the Safe Harbor agreement and the introduction of the Privacy Shield this month, data sovereignty is on everyone's mind. If your target audience is in Norway, hosting your API gateway in Frankfurt or London introduces unnecessary physical latency (approx. 20-30ms RTT) and potential legal headaches regarding data transfer outside the Nordics.
By placing your gateway in Oslo, you leverage the NIX (Norwegian Internet Exchange). Traffic between your server and Norwegian ISPs like Telenor or Altibox often stays entirely within the country. This drops network latency to single digits (often 2-5ms). For a financial trading API or a real-time bidding system, that difference is everything.
Final Thoughts
Performance isn't accidental; it is engineered. By tuning your kernel, optimizing Nginx upstream connections, and choosing infrastructure that respects I/O requirements, you build a gateway that scales.
Don't let legacy infrastructure throttle your modern application. If you need a sandbox to test these configurations, deploy a CoolVDS NVMe instance today. You get root access, KVM isolation, and the low latency your Norwegian users deserve.