API Gateway Performance Tuning: Squeezing Milliseconds Out of Nginx on Linux

Your API isn't slow. Your infrastructure is lying to you. I recently audited a fintech setup in Oslo where the development team spent three weeks refactoring Go microservices to shave off 10ms of processing time. It didn't matter. Their ingress controller was adding 200ms of latency during peak loads because they were hitting connection tracking limits and suffering from massive CPU steal time on a generic public cloud provider.

If you are running high-throughput workloads—whether it's Kong, Nginx, or HAProxy—default Linux distributions are configured for compatibility, not performance. They are tuned for a file server from 2010, not an API gateway handling 50k requests per second in 2022. Here is how we fix the stack, from the kernel up, while keeping Datatilsynet happy.

1. The Kernel: Fixing the TCP Stack

Before touching your gateway configuration, you must address the OS. The default ephemeral port range and backlog settings will drop packets silently when traffic spikes. I see this constantly: `dmesg` full of "possible SYN flooding on port 443". That is usually not a DDoS; it is your legit users hitting a wall.

Edit your /etc/sysctl.conf. These settings assume you have at least 4GB RAM and a modern kernel (4.19+).

# Maximize the backlog to prevent packet drops during bursts
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535

# Widen the ephemeral port range to allow more outgoing connections (critical for proxying)
net.ipv4.ip_local_port_range = 1024 65535

# Reduce TIME_WAIT state to recycle sockets faster
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1

# Increase buffer sizes for high-bandwidth links (10Gbps+)
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# Enable BBR Congestion Control (check kernel support first)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

Apply this with sysctl -p. The tcp_tw_reuse flag is safer than the deprecated tcp_tw_recycle (which broke NAT) and is essential for gateways proxying requests to backend services.

2. The Gateway: Nginx Configuration for Concurrency

Most default Nginx configs cap worker_connections at 768 or 1024. If you are handling 5,000 concurrent users, Nginx will literally stop accepting new connections. Furthermore, SSL termination is CPU expensive. If you aren't caching SSL sessions, you are burning cycles unnecessarily.

Here is a production-hardened snippet for nginx.conf used in high-traffic deployments:

worker_processes auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 16384;
    use epoll;
    multi_accept on;
}

http {
    # ... logs and mime types ...

    # OPTIMIZATION: Keepalive connections to upstream reduce handshake overhead
    upstream backend_api {
        server 10.0.0.5:8080;
        keepalive 64;
    }

    # OPTIMIZATION: Buffer tuning to prevent disk I/O for small payloads
    client_body_buffer_size 128k;
    client_max_body_size 10m;
    client_header_buffer_size 1k;
    large_client_header_buffers 4 8k;
    output_buffers 1 32k;
    postpone_output 1460;

    # OPTIMIZATION: SSL Session Caching
    ssl_session_cache shared:SSL:10m; # Holds approx 40k sessions
    ssl_session_timeout 10m;
    ssl_buffer_size 4k; # Lower buffer size reduces Time To First Byte (TTFB)
}

Pro Tip: Monitor your "open file descriptors". Even if Nginx is tuned, if the system-wide ulimit -n is 1024, Nginx will crash under load. Set ulimit -n 65535 in your systemd service file or /etc/security/limits.conf.

3. The Hardware Reality: Why Virtualization Matters

You can have the most optimized Nginx config in the world, but if your underlying host is overcommitting resources, you are fighting a losing battle. This is the "Steal Time" metric in top (marked as %st). If this is above 0.0%, your neighbor is stealing your CPU cycles.

In a containerized world, we often forget that I/O Wait is a killer. API Gateways log heavily. If you are on a standard spinning disk or a network-throttled SSD (common in budget VPS), writing logs blocks the worker process. The request hangs until the disk acknowledges the write.

Resource	Budget Cloud VPS	CoolVDS Architecture	Impact on API
CPU	Shared/Burstable (High Steal Time)	Dedicated/KVM (No Steal Time)	Consistent latency vs. random spikes
Storage	Network Storage (SATA/SSD mix)	Local NVMe	Log writing doesn't block requests
Network	Shared 1Gbps Uplink	Dedicated Uplink per Node	No packet loss during neighbor's DDoS

This is why we built CoolVDS on KVM with local NVMe storage. For an API Gateway, disk latency correlates directly to response latency. When we migrated a customer from a generic cloud provider to our NVMe-backed instances in Oslo, their p99 latency dropped from 340ms to 45ms without changing a single line of code.

4. Compliance: The Norwegian Context

Since the Schrems II ruling, sending personal data (PII) across the Atlantic has become a legal minefield. If your API gateway logs IP addresses or user IDs and pushes them to a US-owned cloud region, you are increasing your risk profile. Hosting locally in Norway isn't just about physics (latency); it's about sovereignty.

Using a VPS in Oslo ensures that the round-trip time (RTT) for your local users is under 10ms. A request from Oslo to Frankfurt usually takes 25-30ms. That adds up when your API does multiple round trips. CoolVDS infrastructure is physically located here, ensuring both GDPR compliance and the lowest possible RTT for the Nordic market.

Final Verification

After tuning, run a benchmark. Do not use ab (Apache Bench) as it is single-threaded. Use wrk to simulate realistic load:

# Simulating 12 threads, 400 connections, for 30 seconds
wrk -t12 -c400 -d30s http://your-coolvds-ip/api/health

If you aren't seeing the throughput you expect, check the hardware. Don't let slow I/O kill your performance. Deploy a test instance on CoolVDS in 55 seconds and see the difference raw NVMe power makes.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

API Gateway Performance Tuning: Squeezing Milliseconds Out of Nginx on Linux

API Gateway Performance Tuning: Squeezing Milliseconds Out of Nginx on Linux

1. The Kernel: Fixing the TCP Stack

2. The Gateway: Nginx Configuration for Concurrency

3. The Hardware Reality: Why Virtualization Matters

4. Compliance: The Norwegian Context

Final Verification

/// RELATED POSTS

API Gateway Tuning: Crushing Latency in High-Traffic Nordic Systems

Silence the Noise: Advanced APM Strategies for High-Throughput Norwegian Systems

Bun vs. Node.js in 2025: Why High-Performance Runtimes Die on Cheap VPS Hardware

Zero-Compromise API Gateway Tuning: Reducing Latency from Oslo to the Edge

Nordic Latency Killers: Advanced API Gateway Tuning for High-Throughput Systems

Zen 5 in the Datacenter: Why We Deployed AMD Ryzen 9000 Series for High-Performance VDS