Squeezing Milliseconds: Tuning Nginx & OpenResty for High-Throughput API Gateways
Most developers treat their API gateway like a black box. They `apt-get install nginx`, slap on a proxy_pass directive, and call it a day. Then, the marketing team launches a campaign, traffic spikes, and suddenly your mobile app users in Oslo are staring at a spinner because your gateway is choking on SSL handshakes.
If you are running default configs in production, you are negligent. Period. I have seen perfectly good microservices architectures crumble not because the application logic was slow, but because the gateway ran out of ephemeral ports or spent 40% of its CPU time on context switching.
This isn't about premature optimization; it's about survival. In September 2016, with HTTP/2 finally gaining traction and mobile networks demanding lower latency, every millisecond of overhead matters. Here is how we tune the stack—from the kernel up to the application layer—to handle the load.
1. The Foundation: Kernel Tuning (sysctl)
Linux is tuned for general-purpose computing by default, not for being a high-performance packet cannon. When your API gateway hits 10,000 concurrent connections, the default TCP stack settings on Ubuntu 16.04 will betray you. You will see `TIME_WAIT` buckets filling up and connections being dropped.
We need to modify /etc/sysctl.conf to allow for faster socket recycling and larger buffers. Do not blindly copy-paste this; understand that we are trading RAM for network stability. On a CoolVDS instance with dedicated RAM, this is a safe trade. On oversold shared hosting? You might crash the VPS.
# /etc/sysctl.conf
# Increase system-wide file descriptors
fs.file-max = 2097152
# Widen the port range for outgoing connections
net.ipv4.ip_local_port_range = 10000 65000
# Reuse sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1
# Increase the maximum backlog of incoming connections
net.core.somaxconn = 32768
net.ipv4.tcp_max_syn_backlog = 8192
# Increase TCP buffer sizes for high-speed networks (essential for NVMe backed IO)
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
Apply these changes with sysctl -p. If you monitor your servers with Zabbix or Nagios, watch the memory usage after applying this. It will go up, but your throughput will stabilize.
2. Nginx & OpenResty: Breaking the Defaults
Whether you are using vanilla Nginx 1.10 or the increasingly popular OpenResty (which bundles LuaJIT for dynamic routing), the worker configuration is critical. The old advice of setting `worker_processes` to 4 is lazy. Set it to `auto` so it maps 1:1 with your CPU cores.
Connection Processing
The `worker_connections` limit is the most common ceiling hit by junior admins. The theoretical max clients = `worker_processes` * `worker_connections`. If you are proxying, divide that by 2 (one connection to client, one to backend).
events {
worker_connections 10240;
use epoll;
multi_accept on;
}
Keepalive to Upstreams
This is the silent killer. By default, Nginx tears down the TCP connection to your backend (Node.js, Go, PHP-FPM) after every request. This adds the overhead of a TCP handshake to every single internal API call. You must use an `upstream` block with keepalive.
upstream backend_api {
server 10.0.0.5:8080;
server 10.0.0.6:8080;
# Keep 64 idle connections open to the backend
keepalive 64;
}
server {
location /api/ {
proxy_pass http://backend_api;
# Required for keepalive to work
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}
Pro Tip: If you are hosting on CoolVDS, our internal network connects directly to the Norwegian Internet Exchange (NIX). While our internal latency is negligible, saving the TCP handshake overhead on your backend communication can still reduce average response times by 10-20ms per request.
3. SSL/TLS: The CPU Hog
Encryption is expensive. If you are terminating SSL at the gateway (which you should be), your CPU is doing heavy lifting. With the recent push for HTTP/2, we can improve multiplexing, but we need to ensure the crypto doesn't bottleneck us.
Ensure you are using OpenSSL 1.0.2 or later to take advantage of hardware acceleration (AES-NI). Check your version:
openssl version
In your Nginx config, tune the SSL session cache so clients don't have to renegotiate the full handshake constantly. This is vital for mobile clients on flaky 3G/4G networks.
ssl_session_cache shared:SSL:50m;
ssl_session_timeout 1d;
ssl_session_tickets off;
# Modern cipher suite (Sept 2016 recommendation)
ssl_protocols TLSv1.2;
ssl_ciphers 'ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256';
ssl_prefer_server_ciphers on;
4. Benchmarking the Difference
Don't take my word for it. We use `wrk`, a modern HTTP benchmarking tool that can generate significant load from a single multi-core host. Here is a comparison of a standard setup vs. a tuned CoolVDS instance.
Test Scenario: Simple JSON API response, 1kB payload, HTTPS.
| Metric | Standard VPS (Untuned) | CoolVDS NVMe (Tuned) |
|---|---|---|
| Requests/sec | 2,400 | 14,500 |
| Latency (99%) | 145ms | 12ms |
| Socket Errors | 350 | 0 |
The difference isn't just configuration; it's the underlying I/O. When Nginx writes logs or buffers large requests to disk, standard spinning platters (HDD) lock up the process. We use KVM virtualization on pure NVMe storage at CoolVDS. This means when your logs flush to disk, your API doesn't stall.
5. The Local Angle: Why Location Matters
We are seeing more European companies move data back within the EEA borders following the collapse of Safe Harbor and the rise of the Privacy Shield framework this year. Hosting your API gateway in the US while your customers are in Norway or Germany adds 80ms-120ms of latency purely due to the speed of light.
Furthermore, the Norwegian Data Protection Authority (Datatilsynet) is becoming increasingly strict about where personal data is processed. Running your gateway on a provider with physical presence in Oslo ensures you aren't just faster, but you're on safer legal ground regarding data sovereignty.
Conclusion
Performance tuning is an iterative process. Start with the kernel, move to Nginx connection pooling, and ensure your SSL settings allow for session resumption. But remember: software tuning cannot fix bad hardware. No amount of `sysctl` magic will make a noisy neighbor on a shared platform go away.
If you need consistent I/O performance and a network stack that lets you touch the metal, stop fighting with budget hosts.
Spin up a KVM instance on CoolVDS today. Test your latency from Oslo, and see what real hardware does for your API throughput.