Console Login

API Gateway Performance Tuning: Surviving the Thundering Herd in 2021

API Gateway Performance Tuning: Surviving the Thundering Herd

It is 3:00 AM in Oslo. Your monitoring system is screaming. The marketing team launched a campaign targeting the Nordics, and your API Gateway—the glorious front door to your microservices architecture—is choking. You aren't hitting CPU limits. You have plenty of RAM. Yet, latency has spiked from 25ms to 400ms, and 502 Bad Gateway errors are creeping into the logs.

Most developers immediately blame the application code. They are usually wrong.

In high-throughput scenarios, the bottleneck often isn't Python, Go, or Node.js. It's the Linux network stack and the virtualization layer beneath it. If you are running on over-provisioned budget hosting or an opaque public cloud instance with throttled IOPS, no amount of code refactoring will save you. Here is how we tune systems for raw throughput, ensuring your API Gateway can handle the load without melting down.

1. The Kernel is the Limit: Tuning the TCP Stack

Default Linux distributions, even reliable workhorses like Debian 10 or Ubuntu 20.04 LTS, are tuned for general-purpose usage, not for handling 50,000 concurrent connections. When a flood of requests hits, the kernel drops packets because the backlog queues are too small.

You need to adjust sysctl.conf. This isn't optional for an API gateway.

First, check your current backlog limit:

sysctl net.core.somaxconn

If it returns 128, you are in trouble. That is the size of the queue for pending connections. Once filled, new clients get rejected. Here is a production-ready configuration we deploy on CoolVDS instances for high-load clients:

# /etc/sysctl.conf

# Increase the maximum number of connections in the backlog queue
net.core.somaxconn = 65535

# Increase the max number of backlog connections for the network interface
net.core.netdev_max_backlog = 65535

# Widen the local port range to allow more outbound connections to upstream services
net.ipv4.ip_local_port_range = 1024 65535

# Allow reusing sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Increase memory buffers for TCP
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864

# Enable TCP Fast Open (check client support first)
net.ipv4.tcp_fastopen = 3

After saving, run sysctl -p to apply. This configuration allows your OS to accept thousands of connection attempts per second without panicking.

Pro Tip: Be careful with tcp_tw_recycle. As of Linux kernels 4.12+, it has been deprecated and can cause issues with NAT. Stick to tcp_tw_reuse which is safe for outgoing connections to your backend services.

2. Nginx Configuration: Beyond the Basics

Whether you are using Kong, OpenResty, or raw Nginx, the underlying engine is the same. The default nginx.conf is conservative. The directive worker_connections defaults to 512 or 1024. This is laughably low for a modern gateway.

If you are proxying traffic to upstream services (like a Node.js cluster or a Python API), you must also enable keepalives. Without keepalives, Nginx opens a new TCP connection to your backend for every single request. This adds latency and exhausts your ephemeral ports.

Here is an optimized upstream block:

upstream backend_api {
    server 10.0.0.5:8080;
    server 10.0.0.6:8080;
    
    # Keep at least 64 idle connections open to the backend
    keepalive 64;
}

And here is how you configure the server block to use them:

server {
    listen 80;
    location /api/ {
        proxy_pass http://backend_api;
        
        # HTTP 1.1 is required for keepalive
        proxy_http_version 1.1;
        
        # Clear the Connection header to persist the link
        proxy_set_header Connection "";
        
        # Standard proxy headers
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

This change alone can reduce internal latency by 20-50ms per request, which is massive when you are aiming for that sub-100ms response time.

3. The Hardware Bottleneck: Why IOPS Matter

You might wonder: "Why does storage speed matter for an API gateway? It's just moving network packets!"

This is a common misconception. Your gateway is constantly writing access logs, error logs, and often buffering request bodies to disk if they exceed a certain size (controlled by client_body_buffer_size). If you use a cache layer (like Nginx's proxy_cache), you are reading and writing to disk aggressively.

On cheap VPS providers, storage is often networked (Ceph/NFS) with capped IOPS. When traffic spikes, logging I/O contends with the OS, causing the dreaded "steal time" to skyrocket. Your CPU waits for the disk, and requests queue up.

Comparison: Standard Cloud vs. CoolVDS NVMe

Feature Generic Cloud VPS CoolVDS Performance VPS
Storage Tech SATA SSD (often throttled) Enterprise NVMe
IOPS Limit Typically 300-600 IOPS 10,000+ IOPS
Virtualization Container/Shared Kernel KVM (Full Isolation)
Location Frankfurt/Amsterdam Oslo (Low Latency)

At CoolVDS, we utilize KVM virtualization on local NVMe arrays. This ensures that even if you are logging 10,000 requests per second, your disk I/O latency remains negligible. There is no "noisy neighbor" stealing your disk cycles.

4. TLS Termination and CPU Caching

In 2021, everything must be encrypted. TLS termination is computationally expensive. If you are handling thousands of handshakes, you need to reuse SSL sessions to avoid the heavy cryptographic math on every connection.

Add this to your nginx.conf http block:

# Cache SSL sessions for 10 minutes (10m)
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;

# Disable session tickets if you have multiple load balancers without key sync
# But for a single node, tickets are faster.
ssl_session_tickets on;

Also, ensure you are using an OS that supports OpenSSL 1.1.1 to take advantage of TLS 1.3, which reduces the handshake latency by one full round-trip.

5. The Norwegian Context: Compliance & Latency

Latency isn't just about code; it's about physics. If your users are in Norway, routing traffic through a data center in Frankfurt adds 15-25ms of latency purely due to distance and fiber hops. By hosting in Oslo, you cut that network latency to 2-5ms.

Furthermore, consider the legal landscape. Following the Schrems II ruling last year (July 2020), transferring personal data to US-owned cloud providers has become a legal minefield regarding GDPR compliance. Datatilsynet (The Norwegian Data Protection Authority) is watching closely.

Hosting on a Norwegian provider like CoolVDS, where data stays within the jurisdiction, simplifies your compliance posture significantly. You get lower latency and legal peace of mind.

Final Thoughts

Optimizing an API gateway is an exercise in removing roadblocks. You clear the kernel queues, you keep connections alive upstream, and you ensure your storage I/O can keep up with your logs.

Don't let a default configuration file be the reason your launch fails. And definitely don't let slow storage kill your response times. If you need a rig that respects your sysctl settings and offers the raw NVMe speed required for high-performance routing, we are ready for you.

Ready to test your tuning skills? Deploy a high-performance KVM instance in Oslo on CoolVDS today and see the difference true NVMe throughput makes.