API Gateway Latency is the New Downtime: Tuning Nginx & Traefik for Millisecond Precision
If your API response time averages over 100ms within Europe, your architecture is broken. Harsh? Maybe. Accurate? Absolutely.
In the high-frequency trading floors of Oslo or the real-time logistics hubs of Hamburg, latency isn't an inconvenience. It is a business metric that directly correlates with revenue loss. I have seen perfectly written Go microservices choked to death because the API Gateway sitting in front of them was running on default configurations. Or worse, running on a noisy, oversold VPS where CPU steal time spiked every time a neighbor decided to run a backup.
This is not a guide for beginners. We are going to look at how to strip away overhead, tune the Linux network stack, and configure Nginx and Traefik for raw, unadulterated speed. We are writing this in March 2025, where HTTP/3 is standard and TLS 1.3 is non-negotiable.
The "Thundering Herd" and the Kernel Bottleneck
Before we touch the application layer, we must address the OS. Most Linux distributions ship with generic settings designed for desktop compatibility, not high-throughput packet switching. When you hit 10,000 concurrent connections, the defaults fail.
I recall a project for a Norwegian fintech company integrating with PSD2 APIs. They were baffled by random 502 Bad Gateway errors during traffic spikes, despite low CPU usage. The culprit? The kernel dropped SYN packets because the backlog queue was full.
Essential Kernel Tuning for 2025
Edit your /etc/sysctl.conf. These settings optimize the TCP stack for high connection rates and rapid recycling of sockets.
# Increase the maximum number of open files
fs.file-max = 2097152
# Maximize the backlog of incoming connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
# Reuse sockets in TIME_WAIT state for new connections
# Critical for high-throughput API gateways
net.ipv4.tcp_tw_reuse = 1
# Increase TCP buffer sizes for modern high-bandwidth links (10Gbps+)
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# Protect against SYN flood while allowing legitimate spikes
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_syncookies = 1
Apply this with sysctl -p. If your hosting provider doesn't allow you to modify kernel parameters, leave. You are on a container, not a real VPS. At CoolVDS, we use KVM virtualization specifically so you have full control over your kernel space. You cannot tune a race car if the hood is welded shut.
Nginx: The Old Guard, Still the Fastest
Nginx remains the king of static serving and reverse proxying, but its default config is conservative. For an API Gateway, the goal is to keep connections to the upstream (your backend services) open. Handshaking is expensive. Don't do it for every request.
The Upstream Keepalive Configuration
Most tutorials miss this. You must explicitly configure the upstream block to keep connections alive.
http {
# ... basic settings ...
upstream backend_api {
server 10.0.0.5:8080;
server 10.0.0.6:8080;
# Keep 64 idle connections per worker process
keepalive 64;
}
server {
location /api/ {
proxy_pass http://backend_api;
# standard proxy headers
proxy_set_header Connection "";
proxy_http_version 1.1;
# Buffer tuning
proxy_buffers 16 16k;
proxy_buffer_size 32k;
}
}
}
Without proxy_http_version 1.1 and clearing the Connection header, Nginx defaults to HTTP/1.0 close behavior, forcing a new TCP handshake for every single API call. In a microservices environment, this adds milliseconds of latency that compound quickly.
Traefik: The Modern Contender
Traefik is fantastic for dynamic service discovery, especially with Docker. However, being Go-based, it behaves differently regarding memory and garbage collection. In 2025, Traefik v3.x is mature, but it still needs help under load.
If you see latency spikes in a sawtooth pattern, it is likely the Go Garbage Collector (GC) pausing execution. You can tune this via the GOGC environment variable.
Pro Tip: By default, GOGC is 100. Increasing this to 200 or 400 tells Go to wait until the heap grows significantly before sweeping. This trades RAM for CPU cycles and latency stability.
# Docker Compose example for CoolVDS NVMe Instances
services:
traefik:
image: traefik:v3.2
environment:
- GOGC=200
command:
- "--entrypoints.web.address=:80"
- "--entrypoints.websecure.address=:443"
- "--entrypoints.websecure.http3"
# Buffer tuning for request bodies
- "--transport.buffering.maxRequestBodyBytes=10485760"
- "--transport.buffering.memRequestBodyBytes=2097152"
ports:
- "80:80"
- "443:443"
- "443:443/udp" # HTTP/3
The Hardware Reality: NVMe and CPU Steal
You can have the most optimized Nginx config in the world, but if your I/O Wait is high, your API will crawl. API Gateways log heavily. Access logs, error logs, audit trails. In a high-traffic scenario, writing these logs to a spinning HDD or a network-mounted storage (common in cheap cloud setups) blocks the worker threads.
This is why we standardized on local NVMe storage for all CoolVDS instances. The IOPS capability of NVMe ensures that logging never becomes a blocking operation. Furthermore, for Norwegian businesses, data residency is critical. Under GDPR and the scrutiny of Datatilsynet, ensuring your logs aren't being replicated to a jurisdiction with weak privacy laws is a compliance necessity.
Comparison: Latency Impact of Virtualization
| Metric | Container-based VPS (LXC/OpenVZ) | CoolVDS (KVM + Dedicated Resources) |
|---|---|---|
| CPU Steal Time | Variable (High Risk) | Near Zero (Dedicated) |
| Kernel Tuning | Restricted | Full Control |
| I/O Latency | Shared/Networked | Local NVMe |
| Avg API Overhead | 15-40ms | < 5ms |
TLS Offloading and AES-NI
Encryption is computationally expensive. In 2025, if your CPU doesn't support the AES-NI instruction set, you are burning money. Ensure your SSL termination happens on hardware that supports this.
To verify your CoolVDS instance supports this (which it does), run:
grep -o aes /proc/cpuinfo | head -1
If you get output, your SSL handshakes are hardware-accelerated. If not, upgrade immediately.
Summary
Performance isn't magic. It is the ruthless elimination of bottlenecks. By tuning your Linux kernel for concurrency, configuring Nginx/Traefik to reuse connections, and ensuring your underlying hardware uses NVMe and isolated CPU resources, you can drop your API latency from 150ms to 20ms.
For developers in Norway and Europe, latency to the NIX (Norwegian Internet Exchange) matters. Don't host your Oslo-facing APIs in a generic US-East region. Keep the data close, keep the config tight, and keep the hardware fast.
Ready to test your tuned gateway? Deploy a high-performance CoolVDS NVMe instance in 55 seconds and see the difference raw I/O power makes.