API Gateway Performance Tuning: Squeezing Microseconds out of Nginx on Linux

If your API Gateway adds more than 3 milliseconds of overhead to a request, you are doing it wrong. In the microservices era, where a single user action might spawn ten internal calls, that overhead compounds fast. Suddenly, a "fast" application feels sluggish, customers bounce, and your metric dashboards turn red.

I recently audited a setup for a fintech client in Oslo. They were running a standard Nginx reverse proxy on a generic cloud instance. Their complaint? "Random" latency spikes during trading hours. They blamed the database. They blamed the Python backend. They were wrong.

The culprit was a combination of default Linux kernel settings and a noisy neighbor on their virtual machine stealing CPU cycles during SSL handshakes. Here is how we fixed it, and how you can tune your stack to handle thousands of requests per second without sweating.

1. The Hardware Reality Check: Why KVM Matters

Before touching a single config file, look at your infrastructure. In 2019, running high-throughput API gateways on container-based virtualization (like OpenVZ) or oversold public cloud instances is a gamble. You need deterministic performance.

API Gateways are CPU and I/O intensive. They handle TLS termination, request routing, and logging simultaneously. If another tenant on your physical host decides to mine crypto or compile a kernel, your Steal Time goes up. Your latency jitters.

Pro Tip: Always check your steal time using top or vmstat. If %st is consistently above 0.5, migrate immediately. This is why at CoolVDS we strictly use KVM virtualization. It guarantees that the CPU resources you pay for are actually yours, which is non-negotiable for gateway nodes.

2. Linux Kernel Tuning for High Concurrency

Default Linux distros (CentOS 7, Ubuntu 18.04) are tuned for general-purpose usage, not for handling 50,000 concurrent TCP connections. When you hit the limit, the kernel silently drops packets. It's infuriating to debug.

You need to modify /etc/sysctl.conf to widen the TCP highway. These settings are safe for production on modern kernels (4.x/5.x):

# Max open files (requires ulimit change as well)
fs.file-max = 1000000

# Increase the TCP backlog queue
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535

# Reuse sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Expand port range for outgoing connections (crucial for proxying)
net.ipv4.ip_local_port_range = 1024 65535

# Increase TCP buffer sizes for high-speed networks (1Gbps+)
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864

# Enable BBR congestion control (Available since kernel 4.9)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

Apply these with sysctl -p. The tcp_tw_reuse setting is particularly important for gateways connecting to upstream services, as it prevents port exhaustion.

3. Nginx: The Gateway Config

Whether you are using vanilla Nginx, OpenResty, or Kong, the underlying Nginx configuration dictates performance. The most common mistake I see is the lack of upstream keepalives.

By default, Nginx opens a new connection to your backend service (Node.js, Go, PHP-FPM) for every single request. This adds the full TCP handshake overhead to every call. It is inefficient and slow.

Here is the correct way to configure an upstream block to reuse connections:

upstream backend_microservice {
    server 10.0.0.5:8080;
    
    # The secret sauce: keep unused connections open
    keepalive 64;
}

server {
    listen 443 ssl http2;
    server_name api.coolvds-client.no;

    # SSL Optimization
    ssl_certificate /etc/letsencrypt/live/api...;
    ssl_certificate_key /etc/letsencrypt/live/api...;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers EECDH+AESGCM:EDH+AESGCM;
    
    location / {
        proxy_pass http://backend_microservice;
        
        # Required for keepalive to work
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        
        # Buffering settings
        proxy_buffers 16 16k;
        proxy_buffer_size 32k;
    }
}

Notice the proxy_set_header Connection ""; directive. Without this, Nginx forwards the "Connection: close" header from the client to the backend, killing the keepalive tunnel you just tried to create.

4. Storage I/O: The Hidden Latency

"But Nginx is in memory, why does disk matter?" Because you log. You write access logs, error logs, and potentially buffer large request bodies to disk.

On a standard HDD or a cheap SATA SSD, heavy logging can block the worker process. In 2019, NVMe storage is no longer a luxury; it's a requirement for high-load systems. NVMe drives offer massively deeper command queues compared to AHCI/SATA.

Storage Type	Avg Read Latency	IOPS (Approx)
Standard HDD	~10-15 ms	80-120
SATA SSD	~0.2 ms	5,000 - 80,000
CoolVDS NVMe	~0.02 ms	200,000+

If you cannot disable logging (and for compliance reasons like GDPR or banking regulations, you often can't), ensure your VPS is backed by NVMe. It prevents the "logging blocker" scenario where disk writes stall network reads.

5. Local Context: Norway and Data Residency

Latency is governed by the speed of light. If your users are in Oslo, Bergen, or Trondheim, hosting your gateway in Frankfurt adds ~20ms of round-trip time (RTT) unavoidable physics.

Furthermore, with the tightening grip of GDPR and the uncertainty surrounding the US CLOUD Act, keeping data within Norwegian borders is becoming a significant competitive advantage. Hosting locally usually means peering directly via NIX (Norwegian Internet Exchange).

When you deploy on a local provider like CoolVDS, your packets often don't even leave the country. This results in snappier TLS handshakes and a better mobile experience for your end users.

Quick Diagnostic: Check Your Limits

Before you deploy, run this check. If the numbers are low (e.g., 1024), your tuning isn't applied:

ulimit -n
# Should be > 65535 for the nginx user

To make this permanent for the Nginx user, edit /etc/security/limits.conf:

nginx       soft    nofile  65535
nginx       hard    nofile  65535

Conclusion

Performance tuning is an iterative process, but the foundation must be solid. You need a tuned kernel, correct Nginx configurations, and hardware that doesn't steal your cycles. Don't let your infrastructure be the reason your developers lose sleep.

If you need a testing ground that respects these principles—pure KVM isolation, local Norwegian peering, and raw NVMe power—we have the environment ready for you.

Ready to drop your latency? Deploy a high-performance NVMe instance on CoolVDS today and see the difference in your time_connect metrics.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

API Gateway Performance Tuning: Squeezing Microseconds out of Nginx on Linux (2019 Edition)

API Gateway Performance Tuning: Squeezing Microseconds out of Nginx on Linux

1. The Hardware Reality Check: Why KVM Matters

2. Linux Kernel Tuning for High Concurrency

3. Nginx: The Gateway Config

4. Storage I/O: The Hidden Latency

5. Local Context: Norway and Data Residency

Quick Diagnostic: Check Your Limits

Conclusion

/// RELATED POSTS

Edge Computing in Norway: Architecting for Sub-5ms Latency in 2025

Kubernetes Networking Deep Dive: Optimizing Packet Flow for Low Latency in 2025

Surviving the Packet Storm: A Deep Dive into Kubernetes Networking & CNI Performance in 2025

Surviving the Millisecond War: Edge Computing Architectures for the Nordic Market

Kubernetes Networking Deep Dive: Why Your Packets Are Dropping in the Overlay

Serverless Without the Handcuffs: Implementing Private FaaS Patterns on High-Performance VDS in 2025