The Art of Sub-Millisecond Routing: Advanced API Gateway Tuning for Nordic Workloads
Let’s be honest: default configurations are designed for compatibility, not performance. If you apt install nginx or deploy a standard Traefik container and walk away, you aren't doing DevOps; you're doing digital tourism. I recently audited a payment processing cluster in Oslo where the API gateway was introducing a 40ms overhead per request. For a high-frequency trading platform, that’s not a lag; that’s a death sentence.
In the Nordic hosting market, where we pride ourselves on robust infrastructure and connectivity to NIX (Norwegian Internet Exchange), having a sluggish gateway is inexcusable. The bottleneck isn't usually your code. It's the thousands of TCP handshakes your gateway is negotiating because you didn't tune your file descriptors. Here is how we fix it, assuming you are running on a Linux kernel 5.15+ (standard in 2024) and have root access.
1. The OS Layer: Breaking the Limits
Before you even touch your gateway software, you must address the operating system. Linux is conservative by default. It protects itself from resource exhaustion by limiting open files and connections. For a gateway handling 10,000 requests per second (RPS), these limits are laughable.
First, check your current file descriptor limit:
ulimit -nIf it says 1024, you are throttling your own traffic. We need to raise the ceiling for the nginx or haproxy user. But more importantly, we need to tune the kernel's networking stack to handle high concurrency without choking on `TIME_WAIT` states.
Here is the /etc/sysctl.conf configuration I deploy on every production CoolVDS instance intended for gateway duties. This optimizes the TCP stack for low latency and high throughput.
# /etc/sysctl.conf configuration for High Performance API Gateway
# Maximize the backlog of incoming connections
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
# Reuse connections in TIME_WAIT state (Safe for 2024 kernels)
net.ipv4.tcp_tw_reuse = 1
# Increase port range for outgoing connections to upstream services
net.ipv4.ip_local_port_range = 1024 65535
# Increase TCP buffer sizes for 10Gbps+ uplinks (Standard on CoolVDS)
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# Protection against SYN flood attacks while maintaining performance
net.ipv4.tcp_syncookies = 1
# Increase max open files at kernel level
fs.file-max = 2097152Apply these with sysctl -p. If you are on a shared VPS provider that blocks `sysctl` modification, move immediately. You cannot tune performance if you are in a jail. This is why we use KVM virtualization at CoolVDS; you get your own kernel space to modify parameters as needed.
2. NGINX: The Keepalive Trap
Most engineers configure NGINX as a reverse proxy and forget the most critical directive: keepalive. By default, NGINX uses HTTP/1.0 to connect to upstream backends (like your Node.js or Go services) and closes the connection after every request. This forces a new TCP handshake for every single API call between the gateway and the microservice.
This adds unnecessary CPU load and latency. You need to enable HTTP/1.1 and connection pooling to your upstreams.
Pro Tip: When using TLS termination, the handshake overhead is even higher. Offload SSL at the gateway, and use persistent plain TCP connections to your backend services inside the private VPC.
Here is a battle-tested NGINX upstream configuration block:
http {
# ... basic config ...
upstream backend_api {
server 10.0.0.5:8080;
server 10.0.0.6:8080;
# CRITICAL: Keep 64 idle connections open to the backend
keepalive 64;
}
server {
listen 443 ssl http2;
server_name api.coolvds-client.no;
# SSL Optimization
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
location / {
proxy_pass http://backend_api;
# CRITICAL: Switch to HTTP/1.1 for keepalive support
proxy_http_version 1.1;
# Clear the Connection header to prevent closing
proxy_set_header Connection "";
# Forwarding headers for visibility
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
}
}
}With proxy_set_header Connection "";, you ensure the connection remains open, allowing NGINX to reuse the socket for the next request. In benchmarks, this reduces internal latency by up to 50% under heavy load.
3. Traefik: For the Container Purists
If you are running Kubernetes or Docker Swarm, you might be using Traefik. It’s fantastic, but its default buffering settings can be aggressive. In a microservices architecture, you want to stream data as fast as possible, not buffer it in memory.
For Traefik v2.10+ (current as of Jan 2024), you should tune the transport settings in your static configuration (traefik.yml) to align with the underlying infrastructure.
serversTransport:
maxIdleConnsPerHost: 100
forwardingTimeouts:
dialTimeout: "2s"
responseHeaderTimeout: "5s"
idleConnTimeout: "90s"Small tweaks make a difference. Set dialTimeout low. If a backend service doesn't accept a TCP connection in 2 seconds, it’s likely dead or overloaded. Fail fast and let the load balancer try the next healthy instance.
4. The Hardware Reality: Why "Cloud" Often Fails
You can have the most optimized NGINX config in the world, but if your "vCPU" is waiting for a physical core to become available because your neighbor is mining crypto, your latency will spike. This is the "Steal Time" metric, and on cheap VPS providers, it kills API performance.
Furthermore, logging. API Gateways generate massive amounts of access logs. If you are writing these to a standard SATA SSD or, god forbid, a network-attached block storage with low IOPS limits, your I/O wait times will block the NGINX worker processes.
This is where the choice of hosting becomes architectural, not just financial. In Norway, strict data residency laws (GDPR) mean you often need local hosting. But local shouldn't mean slow.
Storage Latency Comparison
| Storage Type | Avg Latency (4K Write) | Impact on API Logs |
|---|---|---|
| Standard SATA SSD | 0.5 - 1.0 ms | Moderate blocking at high concurrency |
| Network Block Storage (Cloud) | 2.0 - 5.0 ms | Significant jitter, random spikes |
| CoolVDS Local NVMe | 0.05 - 0.1 ms | Near-zero blocking, instant writes |
When we built the infrastructure for CoolVDS, we mandated local NVMe storage for this exact reason. Writing an access log line shouldn't take longer than processing the request.
5. Buffer Sizes and Payload Handling
Another common mistake is misconfiguring buffer sizes. If a client sends a payload slightly larger than your buffer, NGINX writes it to a temporary file on the disk. Even with NVMe, disk I/O is slower than RAM.
Check your client_body_buffer_size. If your API accepts JSON payloads up to 16KB, ensure your buffer covers it:
client_body_buffer_size 16k;Also, verify proxy_buffers. If your backend returns large lists, increase the buffer count so NGINX can read the whole response from the backend and release that connection immediately, serving the content to the slow client from its own memory.
proxy_buffers 8 16k;
proxy_buffer_size 16k;Conclusion
Performance tuning is an exercise in removing bottlenecks. You start at the kernel (sysctl), move to the application (NGINX/Traefik config), and finally ensure the physical substrate (Hardware) isn't undermining your efforts.
For developers and sysadmins in Norway, the goal is simple: keep the data local for compliance, but keep the infrastructure world-class for speed. Don't let a default config file be the reason your users churn.
If you want to test these configurations on hardware that doesn't fight against you, spin up a Performance Instance on CoolVDS. We provide the raw NVMe power; you provide the code.