Scaling NGINX as an API Gateway: Tuning Linux for 100k Req/Sec

Everyone is rushing into microservices this year. It’s the trend of 2016. But here is the dirty secret nobody puts in the conference slides: splitting your monolith into twenty services creates a latency nightmare. You have just replaced fast in-memory function calls with slow network hops. If your API Gateway—the traffic cop managing this chaos—isn't tuned to perfection, your architecture will collapse under load.

I recently audited a setup for a Norwegian e-commerce client expecting heavy traffic. They were running default NGINX configs on standard cloud instances in Frankfurt. The result? 502 Bad Gateway errors at just 2,000 requests per second. The CPU was idle. The RAM was free. The problem was the network stack and the "noisy neighbors" on their shared hosting.

We fixed it by moving to CoolVDS KVM instances in Oslo and tuning the Linux kernel properly. Here is exactly how we did it.

1. The Silent Killer: Ephemeral Ports and Timeouts

When NGINX proxies a request to an upstream microservice, it opens a TCP connection. By default, when that connection closes, the socket enters a TIME_WAIT state for 60 seconds. If you push 1,000 requests per second, you will burn through the entire range of available local ports (about 28,000) in under 30 seconds. New connections will be dropped. Your logs will scream "Cannot assign requested address".

You need to tell the Linux kernel to recycle these connections faster. Edit your /etc/sysctl.conf:

# Allow reuse of sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Decrease the time default value for tcp_fin_timeout connection
net.ipv4.tcp_fin_timeout = 15

# Increase the ephemeral port range
net.ipv4.ip_local_port_range = 1024 65535

# Increase the maximum number of connections in the backlog
net.core.somaxconn = 4096
net.ipv4.tcp_max_syn_backlog = 4096

Apply these changes with sysctl -p. This simple change alone can double your throughput on high-traffic API gateways.

2. NGINX Upstream Keepalives

Most people configure NGINX as a reverse proxy but forget that NGINX uses HTTP/1.0 for upstream connections by default. This means it closes the connection to your backend API after every single request. The overhead of the TCP Handshake (SYN, SYN-ACK, ACK) kills latency.

You must enable keepalives to the backend. This keeps the TCP connection open and reuses it.

upstream backend_api {
    server 10.0.0.5:8080;
    server 10.0.0.6:8080;
    
    # Keep 64 idle connections to the upstream open
    keepalive 64;
}

server {
    location /api/ {
        proxy_pass http://backend_api;
        
        # Required for keepalive to work
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}

Without the empty Connection header, NGINX forwards the client's "close" header to the backend, defeating the purpose.

3. SSL Termination and the CPU Cost

With Let's Encrypt entering public beta recently, SSL is becoming standard everywhere. But terminating SSL/TLS is CPU expensive. If you are doing this on an overloaded host, your handshake times will spike.

Ensure your server supports AES-NI (hardware acceleration for encryption). On CoolVDS KVM instances, we pass the host CPU flags through to the guest. You can verify support:

grep -o aes /proc/cpuinfo

If you don't see output, your hosting provider is emulating an old CPU. Move to a provider that gives you modern hardware access.

4. The Danger of Shared Resources (Steal Time)

In a virtualized environment, "Steal Time" is the percentage of time your virtual CPU waits for the physical CPU to serve it. On budget OpenVZ or oversold Xen VPS providers, I often see steal time hit 20-30% during peak hours.

Pro Tip: Run top and look at the %st value. If it is consistently above 0.0, your latency is fluctuating because your neighbors are noisy.

For an API Gateway, consistency is more important than raw speed. A request taking 200ms is bad, but a request taking 20ms usually and 5000ms occasionally is worse—it causes timeouts and cascading failures. This is why we deploy gateways on CoolVDS. The KVM virtualization ensures strict resource isolation. We don't suffer from the noisy neighbor effect common in cheaper alternatives.

5. Moving Logic to the Edge with Lua

Sometimes you need to route traffic based on a header or validate a token before it hits your backend. Doing this in Java or Ruby is slow. NGINX combined with Lua (OpenResty) allows you to execute logic directly in the gateway, microseconds after the packet arrives.

Here is a basic example of checking a custom header before proxying, without hitting the backend:

location /secure/ {
    access_by_lua_block {
        local key = ngx.req.get_headers()["X-Api-Key"]
        if not key or key ~= "secret_123" then
            ngx.exit(ngx.HTTP_FORBIDDEN)
        end
    }
    proxy_pass http://backend_api;
}

This blocks invalid traffic at the edge, saving your backend resources for legitimate users.

6. Data Sovereignty and Latency in Norway

With the Safe Harbor agreement invalidated last October, storing data outside the EU/EEA has become legally risky. Datatilsynet (The Norwegian Data Protection Authority) is watching closely. If you are serving Norwegian users, routing traffic through US-owned servers or even hubs in London adds unnecessary legal complexity—and latency.

Hosting locally in Oslo isn't just about compliance; it's physics. The round-trip time (RTT) from a fiber connection in Oslo to a server in Oslo is roughly 2-5ms. To Frankfurt, it's 25-35ms. If your API gateway makes three sequential calls to different services, that latency compounds.

Request Flow	Hosted in Frankfurt	Hosted in Oslo (CoolVDS)
Single TCP Connect	30ms	3ms
SSL Handshake (2x RTT)	60ms	6ms
Total Overhead	90ms+	~10ms

For a high-performance gateway, saving 80ms of dead air time per request is massive.

Conclusion

You cannot simply install NGINX via apt-get and expect it to handle 10,000 requests per second. You must tune the Linux TCP stack, enable upstream keepalives, and most importantly, ensure your underlying infrastructure isn't stealing your CPU cycles.

If you are serious about API performance and need a platform that respects strict resource isolation and Norwegian data standards, stop fighting with oversold shared hosting. Spin up a KVM instance on CoolVDS today and see what 0% Steal Time looks like.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

Scaling NGINX as an API Gateway: Tuning Linux for 100k Req/Sec in 2016

Scaling NGINX as an API Gateway: Tuning Linux for 100k Req/Sec

1. The Silent Killer: Ephemeral Ports and Timeouts

2. NGINX Upstream Keepalives

3. SSL Termination and the CPU Cost

4. The Danger of Shared Resources (Steal Time)

5. Moving Logic to the Edge with Lua

6. Data Sovereignty and Latency in Norway

Conclusion

/// RELATED POSTS

Edge Computing in Norway: Architecting for Sub-5ms Latency in 2025

Kubernetes Networking Deep Dive: Optimizing Packet Flow for Low Latency in 2025

Surviving the Packet Storm: A Deep Dive into Kubernetes Networking & CNI Performance in 2025

Surviving the Millisecond War: Edge Computing Architectures for the Nordic Market

Kubernetes Networking Deep Dive: Why Your Packets Are Dropping in the Overlay

Serverless Without the Handcuffs: Implementing Private FaaS Patterns on High-Performance VDS in 2025