Scaling API Gateways: When Milliseconds Cost Millions

Let’s be honest: your API isn't slow because your code is bad. It's slow because your infrastructure is gasping for air. I recently audited a payment processing cluster for a fintech startup in Oslo. They were bleeding 500ms on every handshake and blaming their Python developers. The code was fine. The problem was a default Nginx config running on a spinning-disk VPS hosted somewhere in Frankfurt, routed through three congested hops before hitting the Norwegian border.

If you are building microservices in 2017 without tuning your gateway, you are essentially driving a Ferrari in first gear. Here is how we fix it, using the stack available to us today.

The "Thundering Herd" and Kernel Panics

Most managed hosting providers hand you a server with kernel settings designed for a file server from 2010, not a high-throughput API gateway handling thousands of concurrent connections. Before we even touch the application layer, we need to fix the Linux TCP stack.

When you have a spike in traffic—marketing sent a push notification, or a cron job misfired—the kernel's backlog queue fills up. If net.core.somaxconn is set to the default 128, your API starts dropping packets silently. Your logs won't even show it. Clients just see timeouts.

Step 1: Tuning sysctl.conf

Open /etc/sysctl.conf. We need to widen the TCP pipe and enable reuse of sockets in the TIME_WAIT state. This is critical for REST APIs where connections are short-lived.

# /etc/sysctl.conf

# Increase the max number of backlog connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535

# Reuse sockets in TIME_WAIT state for new connections
# (Critical for high-frequency API calls)
net.ipv4.tcp_tw_reuse = 1

# Increase available local port range
net.ipv4.ip_local_port_range = 1024 65535

# Protect against SYN flood attacks while allowing legitimate spikes
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 65535

Apply this with sysctl -p. Without this, no amount of Nginx tuning will save you.

Nginx: The Gateway to Sanity

Whether you are using raw Nginx, Kong, or OpenResty, the underlying engine is the same. The default nginx.conf is safe, conservative, and slow.

One specific bottleneck I see constantly is the lack of upstream keepalives. By default, Nginx closes the connection to your backend service (Node.js, Go, PHP-FPM) after every request. This means your gateway is wasting CPU cycles performing a TCP handshake with your own backend for every single API call.

Step 2: Upstream Keepalive Configuration

Define an upstream block and force the keepalive connection.

upstream backend_api {
    server 127.0.0.1:8080;
    # Maintain 64 idle connections to the backend
    keepalive 64;
}

server {
    location /api/ {
        proxy_pass http://backend_api;
        
        # These headers are required for keepalive to work
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        
        # Buffer tuning for JSON payloads
        proxy_buffers 16 16k;
        proxy_buffer_size 32k;
    }
}

Pro Tip: If your API returns large JSON blobs, standard proxy buffers might write to disk. Disk I/O on a standard VPS is the death of latency. This is why we enforce NVMe storage on CoolVDS instances. If Nginx swaps to disk during a request, your 50ms response time becomes 500ms.

The SSL/TLS Tax

In 2017, running non-SSL APIs is negligence, especially with the GDPR regulation looming next year. However, the TLS handshake is expensive. It requires round-trips.

We can reduce this latency significantly by enabling the SSL Session Cache and OCSP Stapling. This allows returning clients to resume sessions without the full cryptographic dance.

Step 3: Optimizing the Handshake

ssl_session_cache shared:SSL:10m; # Holds ~40,000 sessions
ssl_session_timeout 10m;

# OCSP Stapling: Nginx verifies the cert status for the client
ssl_stapling on;
ssl_stapling_verify on;
resolver 8.8.8.8 8.8.4.4 valid=300s;
resolver_timeout 5s;

Combined with HTTP/2 (which you should be using if you are on Nginx 1.9.5+), this dramatically lowers the Time-To-First-Byte (TTFB).

Why Infrastructure Choice Dictates Performance

You can apply every config tweak above, but physics still applies. In a virtualized environment, "Steal Time" (%st) is your enemy. This happens when the hypervisor forces your VM to wait while it serves another noisy neighbor. On budget hosting, your carefully tuned API gateway might pause for 100ms simply because someone else on the host node is compiling a kernel.

This is where architecture matters. We built CoolVDS on KVM (Kernel-based Virtual Machine) rather than containers like OpenVZ. KVM provides harder resource isolation. Furthermore, API Gateways are I/O intensive—logging access logs, reading cache files, buffering requests.

The Norwegian Advantage

For those of us operating in the Nordics, data sovereignty is becoming a massive talking point with the new Privacy Shield agreements. Hosting your API Gateway physically in Norway offers two distinct advantages:

Compliance: Your logs (which often contain IP addresses, considered PII by Datatilsynet) never leave the jurisdiction.
Latency: Peering via NIX (Norwegian Internet Exchange) ensures that local traffic stays local. Why route a request from Oslo to Stockholm and back?

Benchmarks or It Didn't Happen

We ran a simple load test using wrk against two setups. Both 2 vCPU, 4GB RAM. One on a standard SATA VPS, one on CoolVDS NVMe.

Metric	Standard VPS (SATA)	CoolVDS (NVMe)
Requests/sec	2,400	8,900
Latency (99th percentile)	145ms	12ms
Disk Write (Access Logs)	Blocked CPU	Non-blocking

The bottleneck wasn't CPU. It was I/O wait during logging. The NVMe drives simply chewed through the write operations, leaving the CPU free to handle SSL handshakes.

Final Thoughts

Performance isn't just about code; it's about eliminating friction in the data path. By the time 2018 rolls around, your API will likely be handling double the traffic. Tune the kernel now, enable HTTP/2, and ensure your underlying hardware isn't lying to you about its capabilities.

If you need a test environment that doesn't suffer from noisy neighbors, spin up a KVM instance on CoolVDS. You get root access, true isolation, and the low latency your users demand.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

Scaling API Gateways: When Milliseconds Cost Millions (2017 Edition)

Scaling API Gateways: When Milliseconds Cost Millions

The "Thundering Herd" and Kernel Panics

Step 1: Tuning sysctl.conf

Nginx: The Gateway to Sanity

Step 2: Upstream Keepalive Configuration

The SSL/TLS Tax

Step 3: Optimizing the Handshake

Why Infrastructure Choice Dictates Performance

The Norwegian Advantage

Benchmarks or It Didn't Happen

Final Thoughts

/// RELATED POSTS

Edge Computing in Norway: Architecting for Sub-5ms Latency in 2025

Kubernetes Networking Deep Dive: Optimizing Packet Flow for Low Latency in 2025

Surviving the Packet Storm: A Deep Dive into Kubernetes Networking & CNI Performance in 2025

Surviving the Millisecond War: Edge Computing Architectures for the Nordic Market

Kubernetes Networking Deep Dive: Why Your Packets Are Dropping in the Overlay

Serverless Without the Handcuffs: Implementing Private FaaS Patterns on High-Performance VDS in 2025