API Gateway Performance Tuning: Squeezing Every Millisecond Out of Nginx
It is 3:00 AM on a Tuesday. Your monitoring dashboardâprobably Prometheus or Zabbix if you know what you're doingâstarts screaming. Your microservices are fine. Your database load is nominal. Yet, your average response time just spiked from 45ms to 800ms. Clients are seeing 502 Bad Gateway errors.
Welcome to the bottleneck nobody thinks about until it breaks: The API Gateway.
In the last few months, I've audited infrastructure for three major Nordic fintech startups. The pattern is always the same. They build beautiful, containerized applications on Kubernetes v1.16, but they run their ingress or API gateway on default configurations. That is like putting a limiter on a Ferrari engine. In Norway, where we pride ourselves on infrastructure stability and connectivity via NIX (Norwegian Internet Exchange), accepting default latency is professional negligence.
Here is how we fix it. We are going to look at the Linux kernel, Nginx configurations, and why your underlying hardware (specifically storage) is the invisible killer.
1. The "File Descriptor" Trap
The most common error I see in /var/log/nginx/error.log isn't a syntax error. It is:
worker_connections are not enough
or
24: Too many open files
By default, many Linux distros limit the number of file descriptors a user can open to 1024. For a high-traffic API gateway proxying thousands of concurrent connections, this is laughable. Every incoming connection is a file. Every upstream connection to your backend is a file. You hit that limit instantly.
The Fix
First, verify your current limits:
ulimit -n
If it says 1024, you need to edit /etc/security/limits.conf immediately:
nginx soft nofile 65535
nginx hard nofile 65535
root soft nofile 65535
root hard nofile 65535
Then, update your nginx.conf to actually utilize these descriptors. The worker_rlimit_nofile directive allows Nginx to override the shell limit.
user www-data;
worker_processes auto;
worker_rlimit_nofile 65535;
events {
worker_connections 10240;
multi_accept on;
use epoll;
}
Pro Tip: Do not just setworker_processesto a random number. Setting it toautomaps it to your CPU cores. On a CoolVDS High-Performance instance, we map these directly to physical cores via KVM, avoiding the "CPU steal" often seen in budget OpenVZ containers.
2. Optimizing the TCP Stack (sysctl tuning)
Linux was built for general-purpose computing, not for handling 50,000 ephemeral TCP connections per second. When serving as an API gateway, your server creates a new TCP connection for every upstream request unless you configure keepalives (more on that later). This leads to port exhaustion.
You need to tune the kernel to recycle TIME_WAIT sockets faster. Add this to /etc/sysctl.conf:
# Allow reuse of sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1
# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 10240 65535
# Max number of packets in the receive queue
net.core.netdev_max_backlog = 5000
# Increase the maximum number of open file descriptors system-wide
fs.file-max = 2097152
# TCP Hardening and Optimization
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_syncookies = 1
Apply these changes with sysctl -p. This configuration is essential if you are pushing heavy traffic through a VPS Norway based node, ensuring your packets traverse the network stack efficiently.
3. Keepalive Connections: The Latency Killer
SSL handshakes are expensive. Establishing a TCP connection is expensive. If your API gateway opens a new connection to your backend microservice for every single request, you are adding 50ms+ of unnecessary latency per call. For a composite API call that hits five internal services, you just added 250ms of wait time.
Configure your upstream blocks to keep connections open:
upstream backend_service {
server 10.0.0.5:8080;
# Keep 64 idle connections open to this upstream
keepalive 64;
}
server {
location /api/v1/ {
proxy_pass http://backend_service;
# Required for keepalive to work
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
By clearing the Connection header, you prevent the browser or client from sending Connection: close to the backend, allowing Nginx to reuse the socket.
4. The Hardware Reality: NVMe vs. Spinning Rust
You can tune software all day, but if your I/O is blocked, your gateway will stall. API Gateways log heavily. Access logs, error logs, audit trails. If you are writing 5,000 log lines per second to a standard SATA SSD (or worse, a spinning HDD), your iowait will skyrocket.
When the disk blocks, the worker process blocks. When the worker blocks, requests queue up. Latency spikes.
| Storage Type | Avg Read/Write Speed | IOPS (Approx) | Impact on API Gateway |
|---|---|---|---|
| HDD (7200 RPM) | 80-160 MB/s | ~100 | Critical Failure under load. |
| SATA SSD | 500-550 MB/s | ~80,000 | Acceptable for medium loads. |
| NVMe (CoolVDS Standard) | 3,500+ MB/s | ~500,000+ | Zero blocking. Instant logging. |
We built CoolVDS on pure NVMe storage arrays precisely for this reason. In a managed hosting environment where stability is paramount, removing the I/O bottleneck allows the CPU to focus entirely on routing traffic and terminating SSL.
5. Buffer Sizes: Don't Touch Disk
If a request body is larger than your buffer, Nginx writes it to a temporary file on disk. Even with NVMe, writing to disk is slower than RAM. You want to keep payloads in memory.
client_body_buffer_size 128k;
client_max_body_size 10m;
client_header_buffer_size 1k;
large_client_header_buffers 4 4k;
Ensure client_body_buffer_size covers the majority of your POST payloads. If you are handling large image uploads, this strategy changes, but for standard JSON REST APIs, keep it in RAM.
6. Local Context: Data Sovereignty and Latency
In 2020, with GDPR firmly enforced and data privacy concerns growing (especially looking at the scrutiny on US-based cloud providers), where your gateway sits matters. Deploying your API Gateway in Frankfurt when your users are in Oslo introduces unnecessary round-trip time (RTT).
Latency from Oslo to Frankfurt is roughly 15-20ms. Latency from Oslo to a CoolVDS datacenter in Norway is often <2ms. For a high-frequency trading app or a real-time bidding system, that difference is the entire game. Furthermore, storing and processing logs within Norwegian borders satisfies Datatilsynet's strict interpretations of data sovereignty.
Conclusion
Performance isn't about one "magic switch." It is the sum of a tuned kernel, an optimized Nginx configuration, and hardware that doesn't choke on writes. Don't let default settings cripple your application.
If you are ready to stop fighting with iowait and want to see what your API is actually capable of, you need a foundation built for speed. DDOS protection, low latency, and raw NVMe power aren't optional features for us.
Spin up a high-performance KVM instance on CoolVDS today. Experience the difference raw compute makes.