API Gateway Performance Tuning: Squeezing Every Millisecond Out of Nginx on Linux
Let’s get straight to the point: default Linux configurations are designed for general-purpose computing, not for handling 50,000 concurrent API requests per second. If you are running a microservices architecture in 2021 without tuning your edge gateway, you are essentially driving a sports car in first gear.
I recently audited a fintech setup in Oslo where the development team was baffled. They had deployed a robust Kubernetes cluster, but their ingress latency was spiking unpredictably. The code was fine. The database queries were optimized. The culprit? A default Nginx configuration sitting on top of a noisy-neighbor public cloud instance where %CpuSteal was eating their budget.
In the Nordic market, where internet speeds are among the fastest in the world, users notice latency. If your server in Oslo takes 200ms just to handshake because of poor TCP window scaling or saturated file descriptors, you aren't just losing packets; you're bleeding credibility. Here is how we fix it, referencing the stack we run on CoolVDS production nodes.
1. The Foundation: Kernel-Level TCP Tuning
Before you even touch your application layer (be it Kong, Tyk, or raw Nginx), you must fix the OS. Ubuntu 20.04 LTS is our standard, but the defaults are too conservative for an API gateway.
The first bottleneck you will hit is the limit on open files. Every connection is a file descriptor. Default is usually 1024. That is a joke for an API gateway.
Increase File Descriptors
Edit /etc/security/limits.conf to allow your user (and root) to open more connections:
* soft nofile 65535
* hard nofile 65535
root soft nofile 65535
root hard nofile 65535
Sysctl Optimizations
Next, we tune the networking stack. We need to handle TIME_WAIT states efficiently and maximize the backlog for burst traffic. Add these to /etc/sysctl.conf:
# Maximize the backlog of incoming connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
# Reuse connections in TIME_WAIT state (essential for high-throughput APIs)
net.ipv4.tcp_tw_reuse = 1
# Increase ephemeral port range to allow more outbound connections to upstream services
net.ipv4.ip_local_port_range = 1024 65535
# Increase TCP buffer sizes for modern high-speed networks (10Gbps+)
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# Protection against SYN floods
net.ipv4.tcp_syncookies = 1
Apply these changes with sysctl -p. If you are on a restrictive VPS provider, some of these might fail because you share a kernel. This is why we use KVM at CoolVDS; you get your own kernel, so you can tune these parameters without asking for permission.
2. Nginx Configuration: Beyond the Basics
Most tutorials tell you to set worker_processes auto; and call it a day. That is insufficient for a high-load gateway.
Worker Connections & Epoll
You need to ensure Nginx can utilize the file descriptors we unlocked in the kernel. Here is a snippet from a production-ready nginx.conf used for a high-traffic e-commerce API:
user www-data;
worker_processes auto;
# Matches the system limit we set earlier
worker_rlimit_nofile 65535;
events {
# Efficient connection processing method for Linux
use epoll;
# Allow a worker to accept all new connections at once
multi_accept on;
# Number of connections per worker
worker_connections 4096;
}
Keepalive to Upstreams
This is the most common mistake I see. Nginx terminates SSL from the client, but then opens a new connection to your backend (Node.js, Go, Python) for every single request. This adds unnecessary TCP overhead. use the upstream block to keep connections open.
upstream backend_api {
server 127.0.0.1:8080;
# Keep 64 idle connections open to the backend
keepalive 64;
}
server {
location /api/ {
proxy_pass http://backend_api;
# Required for keepalive to work
proxy_http_version 1.1;
proxy_set_header Connection "";
# Buffer tuning
proxy_buffers 16 4k;
proxy_buffer_size 2k;
}
}
3. The Hardware Factor: NVMe and CPU Steal
Software tuning only gets you so far. In 2021, the physical hardware underlying your virtual machine is the hard ceiling.
Storage I/O Latency
API Gateways generate logs. Access logs, error logs, audit trails. If you are writing 5,000 log lines a second to a spinning HDD or a cheap SATA SSD, your I/O wait times will block the CPU. Your Nginx workers will stall waiting for the disk.
Pro Tip: If you cannot disable logging for compliance reasons, ensure your hosting provider uses NVMe storage. In our benchmarks, NVMe drives reduce I/O wait time by nearly 90% compared to standard SSDs during heavy log rotation. Alternatively, pipe logs to syslog/rsyslog on a separate thread to avoid blocking the main event loop.
The "Noisy Neighbor" Problem
In shared hosting or budget VPS environments, providers often oversell CPU cores. You might think you have 4 vCPUs, but if the neighbor VM decides to mine crypto or compile a massive Rust project, your API latency spikes. Check this with top and look at the st (steal) value.
If st > 0.5, migrate immediately. At CoolVDS, we strictly limit overselling and use KVM isolation to ensure that the cycles you pay for are the cycles you get.
4. TLS/SSL Offloading Performance
Encryption is heavy. With Let's Encrypt and standard HTTPS, your CPU is doing heavy math for every handshake. Ensure you are using modern ciphers that take advantage of hardware acceleration (AES-NI).
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384;
ssl_prefer_server_ciphers on;
# Cache SSL sessions to avoid full handshakes on repeat visits
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
5. Compliance and Location: The Norwegian Advantage
Performance isn't just about speed; it's about availability and legality. Since the Schrems II ruling last year (2020), transferring personal data outside the EEA has become a legal minefield. Using US-owned cloud giants introduces complex transfer impact assessments.
Hosting your API Gateway physically in Norway offers two distinct advantages:
- Legal Clarity: Data stays within the EEA/Norway, satisfying Datatilsynet requirements and GDPR compliance.
- Latency: If your user base is Scandinavian, the round-trip time (RTT) to a server in Oslo is often 5-10ms, compared to 30-40ms to Frankfurt or Amsterdam.
Benchmarks: What to Expect
We ran a simple load test using wrk against a standard CoolVDS NVMe instance (4 vCPU, 8GB RAM) versus a generic cloud competitor.
| Metric | CoolVDS (Optimized) | Competitor (Standard) |
|---|---|---|
| Requests/sec | 24,500 | 11,200 |
| Latency (99th percentile) | 12ms | 145ms |
| Disk Write Speed | 1.2 GB/s (NVMe) | 350 MB/s (SSD) |
The difference isn't magic. It's the result of dedicated hardware resources, NVMe storage, and a kernel that isn't choking on default settings.
When you are building the infrastructure for your next project, don't let the gateway be the bottleneck. Start with a solid foundation.
Ready to test these configs? Deploy a high-performance KVM instance in Oslo on CoolVDS today and see the difference raw NVMe power makes.