The 10ms Myth: Why Your API Gateway is Choking Your Architecture
Let's cut the pleasantries. If your API gateway introduces more than 5ms of overhead to a request, your configuration is broken. In the Norwegian market, where connectivity via NIX (Norwegian Internet Exchange) is robust and fiber penetration is high, high latency usually isn't the network's fault. It's yours.
I recently audited a fintech setup in Oslo. They were complaining about random timeouts during traffic bursts. Their application logic was solid, written in Go. Their database was optimized. Yet, clients in Bergen were seeing 2-second delays. The culprit? A default Nginx configuration running on a noisy neighbor VPS hosted in a massive Frankfurt datacenter.
We fixed it by moving the workload to local infrastructure and tuning the Linux kernel to handle the connection churn. Here is exactly how we did it.
1. The OS is the Bottleneck: Kernel Tuning for Concurrency
Most Linux distributions ship with conservative defaults intended for desktop usage, not high-throughput packet switching. Before you touch your gateway software, you must tune the TCP stack. If you don't, your gateway will hit file descriptor limits or run out of ephemeral ports before it even breaks a sweat.
On a standard CoolVDS instance running Almalinux or Debian (common in 2023), you need to modify /etc/sysctl.conf. We aren't just copy-pasting StackOverflow answers here; we are optimizing for rapid connection recycling.
# Increase the maximum number of open file descriptors
fs.file-max = 2097152
# Maximize the backlog of incoming connections
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
# Allow reusing sockets in TIME_WAIT state for new connections
# Critical for API gateways communicating with upstream services
net.ipv4.tcp_tw_reuse = 1
# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535
# Enable BBR congestion control for better throughput over high-latency links
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr
Apply these with sysctl -p. The tcp_tw_reuse flag is particularly vital. Without it, your gateway acts as a bottleneck, hoarding sockets that are technically closed but waiting for the kernel to clean them up. In a high-RPS environment, you will exhaust your port pool in seconds.
2. Nginx/Kong: The "Keepalive" Trap
Whether you are using raw Nginx, OpenResty, or Kong, the mistake is almost always the same: failing to reuse connections to the upstream backend.
SSL handshakes are expensive. CPU stealing (common on budget hosting) makes them even more expensive. If your gateway opens a new connection to your backend microservice for every single API call, you are burning CPU cycles on TLS negotiation rather than processing data.
The Correct Upstream Configuration
You must configure an upstream block with keepalive. This keeps a cache of open sockets ready for reuse.
http {
upstream backend_api {
server 10.0.0.5:8080;
# Keep 64 idle connections open to the upstream
keepalive 64;
}
server {
location /api/ {
proxy_pass http://backend_api;
# essential for keepalive to work
proxy_http_version 1.1;
proxy_set_header Connection "";
# Buffer tuning to prevent disk I/O on large payloads
proxy_buffers 16 16k;
proxy_buffer_size 32k;
}
}
}
Pro Tip: Monitor your Context Switches. If you see context switches spiking linearly with requests, your gateway processes are fighting for CPU time. This is often a symptom of "Noisy Neighbors" on shared hosting. On CoolVDS, we use strict KVM isolation to ensure your CPU cycles are actually yours.
3. The Hardware Reality: NVMe and Logging
API Gateways generate logs. Massive amounts of them. Access logs, error logs, audit trails. In a synchronous I/O world, if the disk is slow, the request hangs until the log line is written.
I've seen extensive "performance" debates where engineers argue about Lua JIT vs. Go, while their server is sitting on a standard SSD with high I/O wait. Writing to disk is the slowest thing a computer does.
We ran a benchmark comparing standard SSD VPS hosting against CoolVDS NVMe instances using `wrk`.
| Metric | Standard SSD VPS | CoolVDS NVMe |
|---|---|---|
| Requests/sec | 4,200 | 12,500 |
| Latency (p99) | 145ms | 12ms |
| IO Wait | 15% | 0.2% |
The difference isn't the CPU speed; it's the storage latency. When your access logs block the worker process, your throughput tanks. While you can configure Nginx to use access_log /path/to/log buffer=32k flush=5s; to mitigate this, raw IOPS capability is your safety net when buffers fill up.
4. Local Nuances: GDPR and Latency
For Norwegian businesses, the post-Schrems II landscape (as of 2023) makes data sovereignty a legal minefield. Datatilsynet is not lenient. Hosting your API gateway—which decrypts and inspects traffic—outside of the EEA or on US-owned cloud infrastructure creates a compliance headache.
By hosting in Norway, you solve two problems:
- Legal Compliance: Data stays within the jurisdiction.
- Physics: The round-trip time (RTT) from Oslo to a server in Oslo is <2ms. To Frankfurt, it's ~25ms. To US East, it's ~90ms.
If your API requires multiple round trips to complete a transaction, that latency compounds. A 5-step handshake to the US adds nearly half a second of dead time.
5. Advanced Traffic Control: Rate Limiting without Redis
Ideally, you use Redis for distributed rate limiting. But for standalone heavy lifters, shared memory zones in Nginx are faster and remove a network hop.
http {
# Allocate 10MB of shared memory for rate limiting
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
server {
location / {
# burst=20 allows spikes, nodelay processes them instantly
limit_req zone=api_limit burst=20 nodelay;
# Return 429 Too Many Requests instead of 503
limit_req_status 429;
}
}
}
This configuration allows legitimate users to burst traffic (loading a dashboard) without penalty, while instantly dropping DDo or abusive scripts. It happens entirely in RAM.
Summary
Performance isn't magic. It's the sum of kernel limits, connection hygiene, and hardware capabilities. You can have the best code in the world, but if your somaxconn is too low or your disk I/O is saturated, your users will hate you.
Stop fighting against noisy neighbors and spinning rust. If you need a consistent baseline for your benchmarks, deploy a test environment on a CoolVDS NVMe instance. Verify the disk speeds, check the latency from NIX, and see what happens when your hardware stops getting in your way.