Squeezing Milliseconds: API Gateway Tuning in the Post-Meltdown Era
If you have patched your servers against Spectre and Meltdown anytime since January, you have likely noticed the hit. Syscalls got expensive. Context switches are hurting more than usual. For a standard web server, you might ignore it. For a high-throughput API gateway handling thousands of requests per second (RPS), that 15-30% CPU overhead is a disaster.
I just spent the last week debugging a bottleneck for a fintech client in Oslo. Their API gateway latency jumped from 25ms to 60ms after the kernel updates. Unacceptable. We fixed it, but not by throwing more hardware at it. We fixed it by stripping away the fat in the Linux TCP stack and Nginx configuration.
Most "cloud" providers oversell their CPU cycles. When your neighbor spins up a crypto miner, your API gateway stalls. This is why we rely on CoolVDS for these workloads—kernel-based isolation (KVM) and NVMe storage mean we aren't fighting for IOPS or fighting the hypervisor's scheduler.
The Architecture: Nginx as the Shield
While tools like Kong (built on OpenResty) are gaining traction, for raw performance in early 2018, I still prefer a lean, hand-tuned Nginx reverse proxy. It’s predictable. It’s battle-tested.
Here is the reality of the Norwegian network landscape: latency to NIX (Norwegian Internet Exchange) is low, but if your gateway spends 50ms establishing a TLS handshake, that advantage is gone. Let's tune the stack from the bottom up.
1. Kernel Tuning: The Foundation
Your operating system defaults are designed for a desktop, not a gateway processing 10k RPS. We need to widen the TCP ephemeral port range and allow faster recycling of sockets. Open your /etc/sysctl.conf.
# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535
# Allow reuse of sockets in TIME_WAIT state for new connections
# Essential for high-traffic API gateways communicating with upstream backends
net.ipv4.tcp_tw_reuse = 1
# Max open files (system wide)
fs.file-max = 2097152
# Increase the maximum number of backlog connections
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
# Minimize swapping (latency killer)
vm.swappiness = 1
Apply these with sysctl -p. If you are on a shared VPS where you can't modify kernel parameters, move. Immediately. You cannot run a serious gateway on a container that doesn't let you touch sysctl.
2. Nginx Configuration: Beyond Defaults
The default nginx.conf is safe, not fast. For an API gateway, we need to keep connections open to the upstream backend to avoid the overhead of opening a new TCP connection for every API call. This is the single biggest performance gain you will see.
The Upstream Keepalive Block
Don't just point `proxy_pass` to an IP. Define an upstream block.
upstream backend_api {
server 10.0.0.5:8080;
server 10.0.0.6:8080;
# Keep 64 idle connections per worker open to the backend
keepalive 64;
}
The Server Block
Now configure the proxy to actually use those keepalive connections. By default, Nginx closes the connection to the backend after the request.
server {
listen 443 ssl http2;
server_name api.example.no;
# SSL Optimization (Critical for 2018 security)
ssl_protocols TLSv1.2;
ssl_ciphers EECDH+AESGCM:EDH+AESGCM;
ssl_prefer_server_ciphers on;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
location / {
proxy_pass http://backend_api;
# correct HTTP version for keepalive
proxy_http_version 1.1;
# Remove the Connection header to keep the link open
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
Pro Tip: If you are logging every single access request to disk on a standard HDD, you are blocking your worker threads. I/O blocking is the enemy. On CoolVDS NVMe instances, this is less of an issue, but we still recommend buffering logs in memory using thebuffer=32k flush=5sparameter on theaccess_logdirective.
3. The "Noisy Neighbor" Factor
This is where infrastructure choice dictates your ceiling. In a virtualized environment, CPU steal time (st) is a metric you must watch. Run top.
top - 14:32:05 up 12 days, 4:12, 1 user, load average: 0.85, 0.90, 0.88
Cpu(s): 12.5%us, 4.2%sy, 0.0%ni, 82.1%id, 0.1%wa, 0.0%hi, 0.2%si, 0.9%st
See that 0.9%st at the end? That is "Steal Time". That is the percentage of time your virtual CPU was ready to work, but the hypervisor made it wait because another customer was using the physical core. If that number goes above 5% on an API gateway, your P99 latency is going to spike unpredictably.
We see this constantly on budget VPS providers. They over-provision cores. CoolVDS architecture is strict about resource allocation. When we provision KVM slices, we ensure that the scheduler isn't starving your I/O threads. With the GDPR enforcement date looming in May, having predictable, secure, and locally hosted infrastructure in the Nordics is not just a technical preference; it's a compliance strategy.
4. Data Privacy & Latency
Speaking of GDPR, your gateway is likely the termination point for TLS. This means you are processing PII (IP addresses, user tokens) right here. If you are hosting this on a US-controlled cloud, you are entering a legal grey area regarding the Cloud Act and Schrems nuances.
Keeping traffic within Norway or the EU reduces legal headaches. Furthermore, the round-trip time (RTT) from a mobile phone on Telenor 4G in Oslo to a server in Frankfurt is ~35ms. To a server in Oslo? ~10ms. For an API making multiple sequential calls, that 25ms delta compounds fast.
Summary Checklist
| Parameter | Recommended Value | Impact |
|---|---|---|
| worker_processes | auto | Maps Nginx workers to CPU cores. |
| worker_rlimit_nofile | 100000 | Allows Nginx to open enough sockets. |
| keepalive_requests | 1000 | Reduces SSL handshake CPU overhead. |
| Disk I/O | NVMe | Prevents log writing from blocking requests. |
Performance isn't magic. It's the sum of a thousand small optimizations. But no amount of tuning fixes a choked network line or a stolen CPU cycle. Ensure your foundation is solid.
Need a baseline? Spin up a CentOS 7 instance on CoolVDS today. Benchmark it against your current provider using wrk or ab. The I/O results usually speak for themselves.