API Gateway Performance Tuning: Squeezing Microseconds out of Nginx on Linux
If your API gateway adds more than 20ms of latency to a request, you are failing. I don't care if you are routing traffic for a monolithic Magento store or a microservices mesh running on Kubernetes 1.12; the gateway is the front door. If the door is stuck, it doesn't matter how nice the furniture inside is.
Most developers throw a default Nginx reverse proxy in front of their application, enable SSL, and call it a day. Then Black Friday hits. The connection table fills up, CPU usage spikes due to TLS handshakes, and suddenly your Time to First Byte (TTFB) from Oslo looks like the traffic is routing through a satellite in orbit. We saw this exact scenario last month with a client attempting to handle 10,000 concurrent WebSocket connections on a standard cloud instance. The CPU wasn't the bottleneck—kernel interrupts and I/O wait were.
Here is how we fixed it, and how you can tune your stack for sub-millisecond overhead using technologies available right now in late 2018.
1. The Kernel is the Limit
Before packets even hit Nginx, the Linux kernel determines their fate. Default distributions like Ubuntu 18.04 LTS are tuned for general-purpose desktop or light server usage, not high-concurrency packet switching.
The first silent killer is the connection tracking table. If you are using iptables or just standard NAT, the kernel tracks every connection. When nf_conntrack fills up, the kernel starts dropping packets silently. You won't see this in your application logs; you'll just see timeouts.
Edit your /etc/sysctl.conf. We need to widen the TCP stack and allow for faster recycling of sockets in the TIME_WAIT state.
# /etc/sysctl.conf
# Increase system-wide file descriptor limits
fs.file-max = 2097152
# Widen the TCP cone logic
net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_tw_reuse = 1
# Protect against SYN flood and allow more pending connections
net.ipv4.tcp_max_syn_backlog = 65535
net.core.somaxconn = 65535
# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535
# Adjust conntrack table size (requires ram)
net.netfilter.nf_conntrack_max = 524288
Apply these with sysctl -p. If you are running on a virtualization platform that doesn't allow kernel tuning (like some budget OpenVZ containers), you are dead in the water. This is why we exclusively deploy on KVM at CoolVDS; you need full kernel authority to run a serious gateway.
2. Nginx: Stop Closing Connections
The most expensive part of an HTTP request is the setup (TCP handshake + TLS handshake). If your Nginx gateway opens a new connection to your backend (upstream) application for every single request, you are burning CPU cycles unnecessarily.
You must enable keepalives to the upstream. Many configurations I audit miss this completely.
# nginx.conf inside http block
upstream backend_api {
server 127.0.0.1:8080;
# CRITICAL: maintain 64 idle connections to the backend
keepalive 64;
}
server {
listen 443 ssl http2;
server_name api.norway-fintech.no;
# TLS Optimization for 2018 standards
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256';
ssl_prefer_server_ciphers on;
# Session Cache is vital for performance
ssl_session_cache shared:SSL:50m;
ssl_session_timeout 1d;
location / {
proxy_pass http://backend_api;
# required to use the 'keepalive' directive in upstream
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header X-Real-IP $remote_addr;
}
}
Pro Tip: Ensure you are using HTTP/2 (http2 directive). It multiplexes requests over a single connection, drastically reducing latency for modern clients. Nginx has supported this stable since version 1.9.5.
3. The Hardware I/O Bottleneck
Logging is the hidden performance tax. Every request generates an access log. On a high-traffic API, writing to a spinning HDD (or a network-throttled SATA SSD) creates iowait blocks. Nginx workers block while waiting for the disk to confirm the write.
You have two choices:
- Buffer the logs (risk losing data on crash).
- Get faster storage.
Buffering helps, but eventually, you flush to disk. This is where the underlying infrastructure of your VPS provider becomes the critical path. Standard cloud volumes often cap IOPS (Input/Output Operations Per Second). If you hit your IOPS limit, your API latency skyrockets regardless of how much CPU you have.
We built CoolVDS on local NVMe storage for this specific reason. NVMe interfaces directly with the PCIe bus, bypassing the legacy SATA controller bottlenecks. In our benchmarks involving an ELK stack (Elasticsearch, Logstash, Kibana) ingesting logs, NVMe drives handled 6x the write throughput of standard SSDs before latency started to degrade.
Comparison: Latency under Load (10k req/sec)
| Storage Type | Avg Latency | 99th Percentile (p99) |
|---|---|---|
| Standard SATA SSD (Shared) | 12ms | 145ms |
| CoolVDS NVMe (Dedicated) | 2ms | 8ms |
4. Local Geo-Latency and Compliance
If your users are in Oslo, Bergen, or Trondheim, hosting in Frankfurt or London adds a physical speed of light penalty (approx 20-30ms RTT). For real-time trading or interactive APIs, that lag is noticeable.
Furthermore, with the enforcement of GDPR earlier this year (May 2018), data sovereignty is a massive headache. Hosting data physically within Norway or the Nordic region simplifies compliance with the Datatilsynet (Norwegian Data Protection Authority) regulations.
By peering directly at NIX (Norwegian Internet Exchange), we reduce the hops between your API and the local ISPs (Telenor, Telia). Less hops mean less jitter.
5. File Descriptor Limits
The final piece of the puzzle is the operating system's safety limits. By default, a user can only open 1024 files. In Linux, a socket is a file. This means your fancy Nginx server can only handle roughly 1000 concurrent users before crashing with "Too many open files".
Update /etc/security/limits.conf:
# /etc/security/limits.conf
* soft nofile 65535
* hard nofile 65535
root soft nofile 65535
root hard nofile 65535
And verify it inside your Nginx configuration:
worker_rlimit_nofile 65535;
Conclusion
Performance isn't an accident. It's a deliberate architecture choice involving the kernel, the web server configuration, and crucially, the hardware underneath. You can have the best nginx.conf in the world, but if your host is stealing CPU cycles or throttling your I/O, you will never reach sub-millisecond performance.
Don't let legacy infrastructure throttle your growth. For your next high-performance project, test the difference raw NVMe power makes.
Deploy a high-performance NVMe KVM instance on CoolVDS in Oslo today.