API Gateway Performance Tuning: Squeezing Milliseconds out of Nginx and Kong

Your API Gateway is the bouncer of your infrastructure. It checks IDs, manages crowds, and throws out the troublemakers. But what happens when the bouncer gets tired? Your entire club stops moving.

I recently audited a microservices setup for a fintech client in Oslo. They were complaining about "database slowness." They were wrong. The database was fine. Their API Gateway (running a stock Nginx configuration inside a crowded container) was choking on SSL handshakes. The latency wasn't in the query; it was in the handshake.

In this guide, we aren't looking at basic setup. We are looking at the specific kernel and application tunings required to handle thousands of requests per second (RPS) while keeping latency flat. This applies whether you are using vanilla Nginx, OpenResty, or Kong.

1. The Hardware Reality Check

Before touching a config file, acknowledge the physical constraints. An API gateway is I/O and CPU bound. It does heavy string manipulation, encryption (TLS), and network socket shuffling.

If you are running this on shared hosting with "burstable" CPU, stop. You need consistent clock cycles. At CoolVDS, we enforce strict KVM isolation. When we say you get 4 vCPUs, they aren't stolen by a neighbor mining crypto next door. Furthermore, logging access requests to disk on a standard SATA drive will block your I/O thread. NVMe storage is not a luxury for gateways; it is a requirement.

2. Linux Kernel Tuning (`sysctl.conf`)

Linux defaults are designed for general-purpose computing, not high-throughput packet switching. We need to open the floodgates. The specific bottleneck here is usually the size of the connection tracking table and the number of open file descriptors.

Edit your /etc/sysctl.conf. These settings assume a server with at least 4GB RAM and 2 cores (typical for a CoolVDS production node).

# /etc/sysctl.conf

# Maximize the number of open file descriptors
fs.file-max = 2097152

# Increase the backlog of incoming connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535

# Widen the port range for outgoing connections (crucial for proxying)
net.ipv4.ip_local_port_range = 1024 65535

# Reuse sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Fast Open allows data exchange during the initial TCP handshake
net.ipv4.tcp_fastopen = 3

# BBR Congestion Control (Available in Kernel 4.9+)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

Apply these with sysctl -p. The BBR congestion control algorithm is particularly important if your users are on mobile networks roaming across Europe. It handles packet loss significantly better than CUBIC.

3. Nginx / OpenResty Configuration

Most defaults in Nginx (version 1.17+) are conservative. For an API Gateway, we need to keep connections alive upstream to avoid the overhead of opening a new TCP connection for every microservice call.

The `worker_processes` and Limits

Set worker_processes auto;. This maps one worker to one CPU core. Context switching is the enemy. On a CoolVDS NVMe instance, this mapping is 1:1 with physical threads, reducing CPU cache misses.

Crucially, increase the file descriptor limit for Nginx specifically:

# /etc/nginx/nginx.conf

worker_rlimit_nofile 65535;

events {
    worker_connections 16384;
    multi_accept on;
    use epoll;
}

Upstream Keepalive

This is the most common mistake. Nginx defaults to HTTP/1.0 for upstream connections and closes them immediately. You must force HTTP/1.1 and enable keepalive.

http {
    upstream backend_api {
        server 10.0.0.5:8080;
        server 10.0.0.6:8080;
        
        # Keep 64 idle connections open to this upstream
        keepalive 64;
    }

    server {
        location /api/ {
            proxy_pass http://backend_api;
            
            # Required for keepalive to work
            proxy_http_version 1.1;
            proxy_set_header Connection "";
            
            # Pass real IP to backend (Essential for audit logs)
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        }
    }
}

Pro Tip: If you are using SSL termination at the gateway (which you should), enable TLS 1.3. It reduces the handshake from two round-trips to one. Ensure your OpenSSL version is 1.1.1 or higher.

4. Managing Buffers and Payload Sizes

APIs often return JSON blobs. If these blobs exceed the Nginx buffer size, Nginx writes them to a temporary file on the disk. Even with NVMe, disk I/O is slower than RAM.

Check your average payload size. If your JSON responses are typically 12KB, but your buffer is 4KB, you are hitting the disk constantly.

http {
    # Handling large headers/cookies
    client_header_buffer_size 1k;
    large_client_header_buffers 4 8k;

    # Body buffer size - keep in RAM if possible
    client_body_buffer_size 16k;
    client_max_body_size 2m; # Strict limit prevents DDoS via large uploads
}

5. The Norway Latency Factor

Physics is stubborn. If your target market is Norwegian businesses or consumers, hosting your gateway in a massive datacenter in Frankfurt or London adds 15-30ms of round-trip time (RTT). For a conversational API requiring multiple round trips, that lag becomes noticeable.

By placing your gateway on a VPS in Norway (proximate to the NIX - Norwegian Internet Exchange), you drop that latency to sub-5ms for local users. This also simplifies GDPR compliance. Keeping data resident within the EEA is mandatory, but keeping it within Norway offers an extra layer of assurance against shifting legal frameworks regarding data transfers.

6. Monitoring and Benchmarking

Don't guess. Measure. Use wrk to load test your configuration. Here is how I test a gateway endpoint with 12 threads and 400 connections:

wrk -t12 -c400 -d30s http://your-coolvds-ip/api/health

Look at the Latency Distribution in the output. High average latency is bad, but a high "Max" latency means you have jitter (noisy neighbors or CPU stealing). If you see 99% of requests under 10ms, but 1% at 500ms, your hosting environment is unstable.

Summary of Optimizations

Parameter	Default	Optimized	Impact
worker_connections	512	16384+	Allows high concurrency
keepalive	0 (off)	64+	Reduces CPU usage on connection setup
TCP Congestion	Cubic	BBR	Better throughput on varied networks
Disk I/O	HDD/SSD	NVMe (CoolVDS)	Prevents logging bottlenecks

Final Thoughts

Tuning an API Gateway is an exercise in removing roadblocks. You clear the kernel limits, you optimize the Nginx buffers, and you ensure the connection to the backend is persistent.

But software tuning can't fix hardware contention. For critical infrastructure, you need predictable performance. That is why we built CoolVDS on pure NVMe storage with KVM virtualization. We don't oversell, and we don't hide behind complex cloud pricing.

Is your API ready for the traffic spike? Spin up a high-performance CoolVDS instance in Oslo today and test your TTFB against the giants.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

API Gateway Performance Tuning: Squeezing Milliseconds out of Nginx and Kong (2020 Edition)

API Gateway Performance Tuning: Squeezing Milliseconds out of Nginx and Kong

1. The Hardware Reality Check

2. Linux Kernel Tuning (`sysctl.conf`)

3. Nginx / OpenResty Configuration

The `worker_processes` and Limits

Upstream Keepalive

4. Managing Buffers and Payload Sizes

5. The Norway Latency Factor

6. Monitoring and Benchmarking

Summary of Optimizations

Final Thoughts

/// RELATED POSTS

API Gateway Tuning: Crushing Latency in High-Traffic Nordic Systems

Silence the Noise: Advanced APM Strategies for High-Throughput Norwegian Systems

Bun vs. Node.js in 2025: Why High-Performance Runtimes Die on Cheap VPS Hardware

Zero-Compromise API Gateway Tuning: Reducing Latency from Oslo to the Edge

Nordic Latency Killers: Advanced API Gateway Tuning for High-Throughput Systems

Zen 5 in the Datacenter: Why We Deployed AMD Ryzen 9000 Series for High-Performance VDS