API Gateway Performance Tuning: Squeezing Every Millisecond Out of Nginx

It is 3:00 AM on a Tuesday. Your monitoring dashboard—probably Prometheus or Zabbix if you know what you're doing—starts screaming. Your microservices are fine. Your database load is nominal. Yet, your average response time just spiked from 45ms to 800ms. Clients are seeing 502 Bad Gateway errors.

Welcome to the bottleneck nobody thinks about until it breaks: The API Gateway.

In the last few months, I've audited infrastructure for three major Nordic fintech startups. The pattern is always the same. They build beautiful, containerized applications on Kubernetes v1.16, but they run their ingress or API gateway on default configurations. That is like putting a limiter on a Ferrari engine. In Norway, where we pride ourselves on infrastructure stability and connectivity via NIX (Norwegian Internet Exchange), accepting default latency is professional negligence.

Here is how we fix it. We are going to look at the Linux kernel, Nginx configurations, and why your underlying hardware (specifically storage) is the invisible killer.

1. The "File Descriptor" Trap

The most common error I see in /var/log/nginx/error.log isn't a syntax error. It is:

worker_connections are not enough

24: Too many open files

By default, many Linux distros limit the number of file descriptors a user can open to 1024. For a high-traffic API gateway proxying thousands of concurrent connections, this is laughable. Every incoming connection is a file. Every upstream connection to your backend is a file. You hit that limit instantly.

The Fix

First, verify your current limits:

ulimit -n

If it says 1024, you need to edit /etc/security/limits.conf immediately:

nginx       soft    nofile  65535
nginx       hard    nofile  65535
root        soft    nofile  65535
root        hard    nofile  65535

Then, update your nginx.conf to actually utilize these descriptors. The worker_rlimit_nofile directive allows Nginx to override the shell limit.

user www-data;
worker_processes auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 10240;
    multi_accept on;
    use epoll;
}

Pro Tip: Do not just set worker_processes to a random number. Setting it to auto maps it to your CPU cores. On a CoolVDS High-Performance instance, we map these directly to physical cores via KVM, avoiding the "CPU steal" often seen in budget OpenVZ containers.

2. Optimizing the TCP Stack (sysctl tuning)

Linux was built for general-purpose computing, not for handling 50,000 ephemeral TCP connections per second. When serving as an API gateway, your server creates a new TCP connection for every upstream request unless you configure keepalives (more on that later). This leads to port exhaustion.

You need to tune the kernel to recycle TIME_WAIT sockets faster. Add this to /etc/sysctl.conf:

# Allow reuse of sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 10240 65535

# Max number of packets in the receive queue
net.core.netdev_max_backlog = 5000

# Increase the maximum number of open file descriptors system-wide
fs.file-max = 2097152

# TCP Hardening and Optimization
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_syncookies = 1

Apply these changes with sysctl -p. This configuration is essential if you are pushing heavy traffic through a VPS Norway based node, ensuring your packets traverse the network stack efficiently.

3. Keepalive Connections: The Latency Killer

SSL handshakes are expensive. Establishing a TCP connection is expensive. If your API gateway opens a new connection to your backend microservice for every single request, you are adding 50ms+ of unnecessary latency per call. For a composite API call that hits five internal services, you just added 250ms of wait time.

Configure your upstream blocks to keep connections open:

upstream backend_service {
    server 10.0.0.5:8080;
    # Keep 64 idle connections open to this upstream
    keepalive 64;
}

server {
    location /api/v1/ {
        proxy_pass http://backend_service;
        
        # Required for keepalive to work
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

By clearing the Connection header, you prevent the browser or client from sending Connection: close to the backend, allowing Nginx to reuse the socket.

4. The Hardware Reality: NVMe vs. Spinning Rust

You can tune software all day, but if your I/O is blocked, your gateway will stall. API Gateways log heavily. Access logs, error logs, audit trails. If you are writing 5,000 log lines per second to a standard SATA SSD (or worse, a spinning HDD), your iowait will skyrocket.

When the disk blocks, the worker process blocks. When the worker blocks, requests queue up. Latency spikes.

Storage Type	Avg Read/Write Speed	IOPS (Approx)	Impact on API Gateway
HDD (7200 RPM)	80-160 MB/s	~100	Critical Failure under load.
SATA SSD	500-550 MB/s	~80,000	Acceptable for medium loads.
NVMe (CoolVDS Standard)	3,500+ MB/s	~500,000+	Zero blocking. Instant logging.

We built CoolVDS on pure NVMe storage arrays precisely for this reason. In a managed hosting environment where stability is paramount, removing the I/O bottleneck allows the CPU to focus entirely on routing traffic and terminating SSL.

5. Buffer Sizes: Don't Touch Disk

If a request body is larger than your buffer, Nginx writes it to a temporary file on disk. Even with NVMe, writing to disk is slower than RAM. You want to keep payloads in memory.

client_body_buffer_size 128k;
client_max_body_size 10m;
client_header_buffer_size 1k;
large_client_header_buffers 4 4k;

Ensure client_body_buffer_size covers the majority of your POST payloads. If you are handling large image uploads, this strategy changes, but for standard JSON REST APIs, keep it in RAM.

6. Local Context: Data Sovereignty and Latency

In 2020, with GDPR firmly enforced and data privacy concerns growing (especially looking at the scrutiny on US-based cloud providers), where your gateway sits matters. Deploying your API Gateway in Frankfurt when your users are in Oslo introduces unnecessary round-trip time (RTT).

Latency from Oslo to Frankfurt is roughly 15-20ms. Latency from Oslo to a CoolVDS datacenter in Norway is often <2ms. For a high-frequency trading app or a real-time bidding system, that difference is the entire game. Furthermore, storing and processing logs within Norwegian borders satisfies Datatilsynet's strict interpretations of data sovereignty.

Conclusion

Performance isn't about one "magic switch." It is the sum of a tuned kernel, an optimized Nginx configuration, and hardware that doesn't choke on writes. Don't let default settings cripple your application.

If you are ready to stop fighting with iowait and want to see what your API is actually capable of, you need a foundation built for speed. DDOS protection, low latency, and raw NVMe power aren't optional features for us.

Spin up a high-performance KVM instance on CoolVDS today. Experience the difference raw compute makes.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

API Gateway Performance Tuning: Squeezing Every Millisecond Out of Nginx in 2020

API Gateway Performance Tuning: Squeezing Every Millisecond Out of Nginx

1. The "File Descriptor" Trap

The Fix

2. Optimizing the TCP Stack (sysctl tuning)

3. Keepalive Connections: The Latency Killer

4. The Hardware Reality: NVMe vs. Spinning Rust

5. Buffer Sizes: Don't Touch Disk

6. Local Context: Data Sovereignty and Latency

Conclusion

/// RELATED POSTS

API Gateway Tuning: Crushing Latency in High-Traffic Nordic Systems

Silence the Noise: Advanced APM Strategies for High-Throughput Norwegian Systems

Bun vs. Node.js in 2025: Why High-Performance Runtimes Die on Cheap VPS Hardware

Zero-Compromise API Gateway Tuning: Reducing Latency from Oslo to the Edge

Nordic Latency Killers: Advanced API Gateway Tuning for High-Throughput Systems

Zen 5 in the Datacenter: Why We Deployed AMD Ryzen 9000 Series for High-Performance VDS