Scaling API Gateways: When Milliseconds Cost Millions
Let’s be honest: your API isn't slow because your code is bad. It's slow because your infrastructure is gasping for air. I recently audited a payment processing cluster for a fintech startup in Oslo. They were bleeding 500ms on every handshake and blaming their Python developers. The code was fine. The problem was a default Nginx config running on a spinning-disk VPS hosted somewhere in Frankfurt, routed through three congested hops before hitting the Norwegian border.
If you are building microservices in 2017 without tuning your gateway, you are essentially driving a Ferrari in first gear. Here is how we fix it, using the stack available to us today.
The "Thundering Herd" and Kernel Panics
Most managed hosting providers hand you a server with kernel settings designed for a file server from 2010, not a high-throughput API gateway handling thousands of concurrent connections. Before we even touch the application layer, we need to fix the Linux TCP stack.
When you have a spike in traffic—marketing sent a push notification, or a cron job misfired—the kernel's backlog queue fills up. If net.core.somaxconn is set to the default 128, your API starts dropping packets silently. Your logs won't even show it. Clients just see timeouts.
Step 1: Tuning sysctl.conf
Open /etc/sysctl.conf. We need to widen the TCP pipe and enable reuse of sockets in the TIME_WAIT state. This is critical for REST APIs where connections are short-lived.
# /etc/sysctl.conf
# Increase the max number of backlog connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
# Reuse sockets in TIME_WAIT state for new connections
# (Critical for high-frequency API calls)
net.ipv4.tcp_tw_reuse = 1
# Increase available local port range
net.ipv4.ip_local_port_range = 1024 65535
# Protect against SYN flood attacks while allowing legitimate spikes
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 65535
Apply this with sysctl -p. Without this, no amount of Nginx tuning will save you.
Nginx: The Gateway to Sanity
Whether you are using raw Nginx, Kong, or OpenResty, the underlying engine is the same. The default nginx.conf is safe, conservative, and slow.
One specific bottleneck I see constantly is the lack of upstream keepalives. By default, Nginx closes the connection to your backend service (Node.js, Go, PHP-FPM) after every request. This means your gateway is wasting CPU cycles performing a TCP handshake with your own backend for every single API call.
Step 2: Upstream Keepalive Configuration
Define an upstream block and force the keepalive connection.
upstream backend_api {
server 127.0.0.1:8080;
# Maintain 64 idle connections to the backend
keepalive 64;
}
server {
location /api/ {
proxy_pass http://backend_api;
# These headers are required for keepalive to work
proxy_http_version 1.1;
proxy_set_header Connection "";
# Buffer tuning for JSON payloads
proxy_buffers 16 16k;
proxy_buffer_size 32k;
}
}
Pro Tip: If your API returns large JSON blobs, standard proxy buffers might write to disk. Disk I/O on a standard VPS is the death of latency. This is why we enforce NVMe storage on CoolVDS instances. If Nginx swaps to disk during a request, your 50ms response time becomes 500ms.
The SSL/TLS Tax
In 2017, running non-SSL APIs is negligence, especially with the GDPR regulation looming next year. However, the TLS handshake is expensive. It requires round-trips.
We can reduce this latency significantly by enabling the SSL Session Cache and OCSP Stapling. This allows returning clients to resume sessions without the full cryptographic dance.
Step 3: Optimizing the Handshake
ssl_session_cache shared:SSL:10m; # Holds ~40,000 sessions
ssl_session_timeout 10m;
# OCSP Stapling: Nginx verifies the cert status for the client
ssl_stapling on;
ssl_stapling_verify on;
resolver 8.8.8.8 8.8.4.4 valid=300s;
resolver_timeout 5s;
Combined with HTTP/2 (which you should be using if you are on Nginx 1.9.5+), this dramatically lowers the Time-To-First-Byte (TTFB).
Why Infrastructure Choice Dictates Performance
You can apply every config tweak above, but physics still applies. In a virtualized environment, "Steal Time" (%st) is your enemy. This happens when the hypervisor forces your VM to wait while it serves another noisy neighbor. On budget hosting, your carefully tuned API gateway might pause for 100ms simply because someone else on the host node is compiling a kernel.
This is where architecture matters. We built CoolVDS on KVM (Kernel-based Virtual Machine) rather than containers like OpenVZ. KVM provides harder resource isolation. Furthermore, API Gateways are I/O intensive—logging access logs, reading cache files, buffering requests.
The Norwegian Advantage
For those of us operating in the Nordics, data sovereignty is becoming a massive talking point with the new Privacy Shield agreements. Hosting your API Gateway physically in Norway offers two distinct advantages:
- Compliance: Your logs (which often contain IP addresses, considered PII by Datatilsynet) never leave the jurisdiction.
- Latency: Peering via NIX (Norwegian Internet Exchange) ensures that local traffic stays local. Why route a request from Oslo to Stockholm and back?
Benchmarks or It Didn't Happen
We ran a simple load test using wrk against two setups. Both 2 vCPU, 4GB RAM. One on a standard SATA VPS, one on CoolVDS NVMe.
| Metric | Standard VPS (SATA) | CoolVDS (NVMe) |
|---|---|---|
| Requests/sec | 2,400 | 8,900 |
| Latency (99th percentile) | 145ms | 12ms |
| Disk Write (Access Logs) | Blocked CPU | Non-blocking |
The bottleneck wasn't CPU. It was I/O wait during logging. The NVMe drives simply chewed through the write operations, leaving the CPU free to handle SSL handshakes.
Final Thoughts
Performance isn't just about code; it's about eliminating friction in the data path. By the time 2018 rolls around, your API will likely be handling double the traffic. Tune the kernel now, enable HTTP/2, and ensure your underlying hardware isn't lying to you about its capabilities.
If you need a test environment that doesn't suffer from noisy neighbors, spin up a KVM instance on CoolVDS. You get root access, true isolation, and the low latency your users demand.