Console Login

API Gateway Performance Tuning: Surviving the Thundering Herd in 2020

API Gateway Performance Tuning: Surviving the Thundering Herd

It is 3:00 AM. Your monitoring system—probably Prometheus or Zabbix—is screaming. Your API Gateway, the literal front door to your infrastructure, is throwing 502 Bad Gateway errors. The backend microservices are fine. The database load is low. So, what is breaking?

Nine times out of ten, it is your TCP stack. It is choking on connection churn.

I have spent the last six months migrating a high-traffic e-commerce platform in Oslo from a legacy monolith to a microservices architecture. We learned the hard way that a standard Linux install (even the shiny new Ubuntu 20.04 LTS) is tuned for a desktop user browsing the web, not for a gateway pushing 20,000 requests per second (RPS).

Here is how we fixed it, moving from fragility to stability, using Nginx as our reference gateway (though the logic applies to Kong or HAProxy).

1. The OS Layer: Linux is Stingy by Default

Before touching Nginx, look at your kernel. By default, Linux restricts the number of file descriptors and is aggressive about closing TCP connections. In a high-throughput environment, this leads to port exhaustion.

Open Your File Descriptors

Every socket is a file. If you are capped at 1,024 files (the default on many distros), you are capped at roughly 1,024 concurrent connections per process. You need to raise this limit significantly.

Edit /etc/security/limits.conf:

root soft nofile 65535
root hard nofile 65535
nginx soft nofile 65535
nginx hard nofile 65535

Tune the TCP Stack

Next, we tackle sysctl.conf. The most critical setting here is tcp_tw_reuse. In a microservices environment, your gateway opens thousands of short-lived connections to backends. Without reuse, these sockets sit in a TIME_WAIT state for 60 seconds, uselessly consuming a port.

Add the following to /etc/sysctl.conf and run sysctl -p:

# Allow reusing sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Increase the maximum number of open sockets waiting for a connection
net.core.somaxconn = 65535

# Widen the local port range to allow more outbound connections
net.ipv4.ip_local_port_range = 1024 65535

# Increase the max backlog
net.core.netdev_max_backlog = 5000
Pro Tip: Do not enable net.ipv4.tcp_tw_recycle. It was removed in recent kernels and causes massive packet drops for clients behind NAT (like mobile users on 4G networks). Stick to reuse.

2. Nginx Configuration: The Silent Killer

Most DevOps engineers configure Nginx as a reverse proxy and forget the most important directive: Keepalives.

By default, Nginx uses HTTP/1.0 for upstream connections and closes the connection after every request. This means for every single API call coming in, your gateway performs a full TCP handshake (SYN, SYN-ACK, ACK) with your backend service. This adds latency and burns CPU cycles.

Enable Upstream Keepalives

You must explicitly tell Nginx to keep the connection open using the HTTP/1.1 protocol. Here is the correct configuration block:

upstream backend_service {
    server 10.0.0.5:8080;
    
    # The number of idle keepalive connections to remain open
    keepalive 64;
}

server {
    location /api/v1/ {
        proxy_pass http://backend_service;
        
        # Required for keepalive to work
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}

In our benchmarks, simply adding keepalive 64; reduced internal latency between the gateway and the microservice from 4ms to 0.8ms. In the world of high-frequency trading or real-time bidding, that difference is an eternity.

3. The Hardware Reality: Why Virtualization Matters

You can tune your kernel until it is perfect, but you cannot tune your way out of "Steal Time" (%st).

In a shared hosting environment (typical budget VPS), your neighbors are fighting you for CPU cycles. If you run top and see %st hovering above 2-3%, your gateway is stuttering because the hypervisor has paused your VM to let someone else run a PHP script.

This is unacceptable for an API Gateway.

This is why we architect CoolVDS differently. We use KVM (Kernel-based Virtual Machine) with strict resource isolation. When you buy 4 vCPUs on CoolVDS, those cycles are reserved for you. We don't overcommit to the point of degradation. Furthermore, API logging requires fast write speeds. If your access logs are blocking I/O, your requests hang.

We strictly use NVMe storage for this reason. Comparing a standard SATA SSD to the NVMe drives we deploy in our Oslo datacenter involves looking at a 6x difference in IOPS. For a high-logging gateway, NVMe is not a luxury; it is a requirement.

Feature Budget VPS CoolVDS NVMe Instance
Virtualization Container (OpenVZ/LXC) Hardware Virtualization (KVM)
Storage Latency 1-5ms 0.05ms - 0.1ms
Noisy Neighbor Risk High Minimal/None

4. The Nordic Context: Latency and Law

If your user base is in Scandinavia, hosting in Frankfurt or London adds unnecessary milliseconds. Physics is stubborn. Light takes time to travel.

Hosting within Norway means your traffic hits the NIX (Norwegian Internet Exchange) immediately. We consistently see pings under 15ms from mostly anywhere in Norway to our CoolVDS instances.

Additionally, with the uncertainty surrounding international data transfers (especially with the evolving interpretations of GDPR by Datatilsynet), keeping your data processing strictly within Norwegian borders simplifies your compliance posture. You avoid the headache of justifying data exports to third-country jurisdictions.

5. SSL Termination: Don't Be Lazy

Finally, ensure you are using modern TLS. Since you are likely on OpenSSL 1.1.1 (standard in 2020 distributions), enable TLS 1.3. It reduces the handshake overhead significantly.

ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers on;

Also, enable OCSP Stapling. This allows your server to present a valid certificate status to the client, saving the client a DNS lookup and a connection to the Certificate Authority. It speeds up the initial page load.

ssl_stapling on;
ssl_stapling_verify on;
resolver 1.1.1.1 8.8.8.8 valid=300s;
resolver_timeout 5s;

Conclusion

Performance is a stack. It starts with the hardware (NVMe, dedicated CPU), moves to the kernel (TCP tuning), and finishes at the application config (Nginx keepalives).

If you are tired of debugging intermittent latency spikes and fighting for CPU cycles on oversold platforms, it is time to upgrade your foundation. Your code deserves an environment that runs as fast as you write it.

Ready to test your tuned config? Deploy a CoolVDS NVMe instance in Oslo today and see the difference raw performance makes.