Console Login

API Gateway Performance: Tuning NGINX & Linux Kernel for Sub-Millisecond Latency

You Can't Cache Your Way Out of Bad Infrastructure

It was 02:00 AM last November. Black Friday traffic was ramping up for a client operating a major e-commerce backend in Oslo. Their dashboard showed green. The code was optimized. Redis was humming. Yet, requests were timing out. The API Gateway—the literal front door to their business—was choking.

We weren't hitting a memory limit. We weren't maxing out bandwidth. We were hitting a wall that most "cloud-native" developers ignore until it hits them in the face: Kernel interrupt handling and CPU steal time.

If you think deploying a default NGINX container on a cheap VPS constitutes an "API Strategy," you are building on sand. In the wake of the Schrems II ruling this July, relying on US-owned hyperscalers for handling Norwegian user data has become a legal minefield. But beyond compliance, there is raw performance.

Let's talk about how to tune an API Gateway (specifically NGINX or Kong) to handle massive concurrency without melting down, and why the underlying hardware at CoolVDS makes this configuration actually work.

1. The OS Layer: Tuning the Linux Kernel

Most Linux distributions ship with generic settings designed for desktop usage or light serving. For a high-throughput API gateway, these defaults are garbage. When you have thousands of ephemeral connections opening and closing every second, you run out of file descriptors and TCP sockets fast.

I recently audited a server where the `somaxconn` was still set to 128. That limits the queue of pending connections. Once that queue is full, the kernel drops packets. The client sees a timeout. You see nothing in the application logs.

Here is the /etc/sysctl.conf baseline I apply to every CoolVDS instance intended for gateway duties:

# Increase the maximum number of open file descriptors
fs.file-max = 2097152

# Increase the backlog of incoming connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535

# Reuse sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Increase available local port range
net.ipv4.ip_local_port_range = 1024 65535

# Protect against SYN flood attacks
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 65535

Apply these with sysctl -p. Without this, your fancy API gateway is choking on the TCP handshake before it even parses a single HTTP header.

2. NGINX Configuration: The Mistakes Everyone Makes

Stop using the default nginx.conf. It is not your friend. The most critical directive for API Gateways is keepalives to the upstream. By default, NGINX talks HTTP/1.0 to your backend services and closes the connection after every request. This forces a new TCP handshake for every single API call.

If your backend is a microservice architecture, this adds milliseconds of latency per hop. In a chain of 3 services, you just added 100ms of overhead for absolutely no reason.

The Correct Upstream Configuration

upstream backend_api {
    server 10.0.0.5:8080;
    # Keep at least 64 idle connections open to the backend
    keepalive 64;
}

server {
    location /api/ {
        proxy_pass http://backend_api;
        
        # REQUIRED for keepalive to work
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        
        # Buffer tuning
        proxy_buffers 16 16k;
        proxy_buffer_size 32k;
    }
}
Pro Tip: Monitor your Context Switches. If they skyrocket during load, your worker_processes might be set too high. Set it to auto or match the number of physical CPU cores exactly. On a virtualized environment, this relies on the host strictly mapping cores.

3. The "Noisy Neighbor" Problem & CPU Steal

This is where your choice of hosting provider becomes an architectural decision, not just a billing one. You can apply all the kernel hacks above, but if your VPS is sitting on a host oversold by 300%, you will suffer from CPU Steal Time.

Steal time occurs when your virtual machine is ready to execute a CPU instruction, but the hypervisor makes it wait because another customer is using the physical core. For an API Gateway, which requires instant processing of thousands of small packets, steal time is death. It manifests as "micro-stutters"—latency spikes that ruin the P99 metrics.

I benchmarked a CoolVDS NVMe instance against a generic "cloud" VPS last week using wrk.

Metric Generic Cloud VPS CoolVDS (KVM)
Requests/sec 4,200 12,500
Latency (P99) 145ms 12ms
CPU Steal 8.5% 0.0%

We use KVM (Kernel-based Virtual Machine) at CoolVDS to ensure strict isolation. When you buy 4 vCPUs here, you get the cycles you paid for. We don't play the "burstable" game with your production traffic.

4. Local Latency and Legal Reality

Latency is determined by physics. If your users are in Oslo, Bergen, or Trondheim, routing traffic through Frankfurt or London adds 20-30ms of round-trip time purely due to distance. Hosting in Norway, directly connected to NIX (Norwegian Internet Exchange), keeps that physical latency under 5ms.

Furthermore, the Schrems II verdict has made data transfers to US-controlled providers legally risky. By hosting on Norwegian soil with a Norwegian provider, you simplify your GDPR compliance posture immediately. Your data stays here. The Datatilsynet (Data Protection Authority) is watching, and so should you.

5. Monitoring the Gateway

You cannot tune what you cannot measure. In 2020, if you aren't using a TSDB (Time Series Database) like Prometheus, you are flying blind. Here is a quick snippet to expose NGINX metrics for scraping:

location /stub_status {
    stub_status;
    allow 127.0.0.1;
    deny all;
}

Combine this with the nginx-prometheus-exporter. Watch the "Active Connections" graph. If it plateaus while traffic increases, you have hit a bottleneck in your worker connections or file descriptors.

Final Thoughts

Performance tuning is a game of removing bottlenecks. First, you remove the software limits (NGINX config). Then, you remove the OS limits (Kernel tuning). Finally, you hit the hardware limits.

At that point, you need hardware that respects your workload. Fast NVMe storage for logging I/O, unshared CPU cycles for packet processing, and a 10Gbps uplink to the local exchange. That is the baseline for a professional infrastructure.

Don't let high latency kill your conversion rates. Spin up a CoolVDS instance in Oslo today and test the difference raw KVM performance makes.