Console Login

API Gateway Performance Tuning: Squeezing Milliseconds from NGINX and Kong (2020 Edition)

The "Microservices Tax" is Real: Here is How We Fix It

If you are running a distributed architecture in 2020, you are paying a tax. It is not paid in Kroner to Skatteetaten, but in milliseconds to your users. Every hop between your ingress controller, your auth service, and your backend database adds latency. When your API gateway—the front door to your entire infrastructure—stutters, the entire platform looks incompetent.

I recently audited a setup for a FinTech client in Oslo. They were running a standard Kubernetes cluster on a generic European cloud provider. Their Kong ingress was choking at 2,000 requests per second (RPS). CPU usage was nominal, memory was fine, but 502 errors were spiking. The culprit? Default Linux kernel settings and aggressive neighbor noise.

You cannot solve hardware contention with software configuration. But provided your foundation is solid, you can tune Linux and NGINX to handle 10x the traffic on the same footprint. Let's look at the specific configurations that separate a production-grade gateway from a dev-environment toy.

1. The Hardware Reality: Why Steal Time Kills APIs

Before touching a config file, look at your infrastructure. In the wake of the Schrems II ruling this July, moving workloads to US-owned clouds has become a legal minefield for Norwegian companies processing personal data. But moving to local hosting often raises fears of performance degradation. It shouldn't.

The metric you need to watch is %st (Steal Time) in top. If this is above 0.0%, your "dedicated" vCPU is waiting for the hypervisor to give it attention because another tenant is hogging the physical core.

Pro Tip: For API Gateways, raw single-core clock speed often beats core count. NGINX event loops favor high frequency. At CoolVDS, we pin instances to high-frequency cores and use NVMe storage exclusively to eliminate I/O wait (%wa) during logging bursts. Latency consistency is impossible if your disk write speed fluctuates.

2. Kernel Tuning: The `sysctl.conf` Essentials

Default Linux distributions (Ubuntu 20.04, CentOS 8) are tuned for general-purpose desktop use, not high-throughput packet switching. You need to open the floodgates.

Edit /etc/sysctl.conf. These settings increase the size of the connection queues and allow the kernel to recycle TCP connections faster. This is critical for REST APIs where clients open and close connections rapidly.

# Increase system file descriptor limit
fs.file-max = 2097152

# Max receive buffer size (8MB)
net.core.rmem_max = 8388608
# Max send buffer size (8MB)
net.core.wmem_max = 8388608

# Increase the maximum number of connections in the listen queue
# Default is often 128, which causes drops during traffic spikes
net.core.somaxconn = 65535

# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535

# Allow reuse of sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Protect against SYN flood attacks
net.ipv4.tcp_syncookies = 1

Apply these with sysctl -p. Without net.core.somaxconn raised, NGINX cannot accept new connections fast enough during a "thundering herd" event, regardless of how many worker processes you configure.

3. NGINX & OpenResty Configuration

Whether you are using raw NGINX, OpenResty, or Kong (which is built on OpenResty), the underlying worker mechanics are identical. The goal is to keep the event loop non-blocking and memory-resident.

Worker Processes and Open Files

In your main `nginx.conf` context:

worker_processes auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 16384;
    multi_accept on;
    use epoll;
}

worker_rlimit_nofile MUST exceed worker_connections. If NGINX runs out of file descriptors, you will see "Too many open files" in your error logs, and your API will simply drop requests.

Upstream Keepalives

This is the most common mistake I see. By default, NGINX acts as a reverse proxy that opens a new connection to your backend (upstream) for every single request. This adds a full TCP handshake (and potentially SSL handshake) to every API call. It is incredibly wasteful.

Configure your upstream block to keep connections open:

upstream backend_service {
    server 10.0.0.5:8080;
    
    # Keep 100 idle connections open to the backend
    keepalive 100;
}

server {
    location /api/ {
        proxy_pass http://backend_service;
        
        # Required for keepalive to work
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}

Setting proxy_set_header Connection ""; is mandatory here, otherwise NGINX forwards the client's "close" header to the backend, killing the connection you tried to save.

4. SSL/TLS Offloading Optimization

In 2020, TLS 1.3 is the standard. If you are not enforcing it, you are wasting CPU cycles. TLS 1.3 reduces the handshake from two round-trips to one (1-RTT). For a user in Tromsø connecting to a server in Oslo, that RTT reduction is perceptible.

Ensure your OpenSSL library is up to date (1.1.1+) and configure caching:

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
ssl_prefer_server_ciphers on;

# Cache SSL sessions to avoid re-handshaking
ssl_session_cache shared:SSL:50m;
ssl_session_timeout 1d;
ssl_session_tickets off; # Rotate tickets if you turn this on for PFS

5. The Local Advantage: NIX and Latency

You can tune your kernel until you dream in C code, but you cannot beat the speed of light. If your primary market is Norway, hosting in Frankfurt or Amsterdam adds a baseline latency of 15-30ms. Hosting in Oslo drops this to <2ms via NIX (Norwegian Internet Exchange).

For high-frequency trading or real-time bidding APIs, that 20ms difference is an eternity.

Benchmark Comparison (August 2020)

Optimization LevelRPS (Requests/Sec)p99 LatencyGateway CPU Load
Default (Vanilla Ubuntu)1,250145ms65%
Kernel Tuned2,80085ms50%
Kernel + NGINX + NVMe (CoolVDS)8,500+22ms45%

Conclusion: Stability is a Feature

Performance tuning is not just about raw speed; it is about predictability under load. By optimizing the file descriptors, enabling upstream keepalives, and ensuring your underlying storage infrastructure relies on NVMe rather than spinning rust or choked SATA SSDs, you build resilience.

With the current focus on data sovereignty following Schrems II, moving your API infrastructure to Norwegian soil is a logical step. But do not accept "local" as an excuse for "slow."

If you are ready to test these configs on infrastructure that doesn't fight against you, spin up a high-performance instance. We don't overprovision, and we don't hide our specs.

Deploy your optimized gateway on CoolVDS today.