Console Login

Optimizing API Gateway Latency: Kernel Tuning and Nginx Strategies for High-Throughput Systems

Stop Letting Default Configs throttle Your API Traffic

I recently audited a fintech infrastructure setup based in Oslo. They were bleeding money—not because of code inefficiencies in their Go microservices, but because their API Gateway (running on a default Nginx install) was choking on 10k concurrent connections. They were hosting on a massive hyperscaler in Frankfurt, adding 25ms of physical latency before the request even hit the TCP handshake.

Latency isn't just a metric; it's a ceiling on your business growth. In the Nordic market, where fiber penetration is among the highest in the world, users notice the difference between 50ms and 15ms. If you are serving Norwegian customers, your handshake needs to happen in Norway.

Here is how we fixed it. We moved the gateway to a KVM-based CoolVDS instance in Oslo and tuned the absolute hell out of the Linux kernel. This is your guide to doing the same.

1. The Foundation: Linux Kernel Tuning

Most Linux distributions ship with generic settings designed for desktop usage or light web serving. When your server acts as an API Gateway, it needs to handle thousands of ephemeral connections. If you don't tune the TCP stack, your kernel will drop packets while your CPU sits idle.

Edit your /etc/sysctl.conf. These settings were validated on Linux Kernel 5.4 (standard for Ubuntu 20.04 LTS) and are safe for production environments handling high throughput.

# Increase system-wide file descriptors
fs.file-max = 2097152

# Widen the port range for outgoing connections
net.ipv4.ip_local_port_range = 1024 65535

# Allow reuse of sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Increase the maximum number of connections in the backlog queue
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 8192

# Increase TCP buffer sizes for 10Gbps+ links (typical in Nordic datacenters)
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

Apply these changes immediately:

sysctl -p
Pro Tip: Be careful with tcp_tw_recycle. It was removed in newer kernels and caused issues with NAT in older ones. Stick to tcp_tw_reuse. It is safer and provides the socket recycling you need for an API Gateway.

2. File Descriptors: The Silent Killer

Nginx is event-driven. Everything is a file. If you hit the open file limit, Nginx throws 502 Bad Gateway errors regardless of how powerful your backend is. I've seen this happen during Black Friday sales more times than I care to admit.

Check your current limits:

ulimit -n

If it says 1024, you are throttled. Update /etc/security/limits.conf to raise the ceiling for your web user (usually www-data or nginx):

root soft nofile 65535
root hard nofile 65535
* soft nofile 65535
* hard nofile 65535

3. Nginx Configuration for Raw Speed

Whether you are using raw Nginx, Kong, or OpenResty, the underlying context is the same. You need to ensure the worker processes are pinned correctly and that you aren't wasting cycles on SSL handshakes.

Here is a snippet from a production nginx.conf optimized for a CoolVDS NVMe instance with 4 vCPUs:

worker_processes auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 16384;
    use epoll;
    multi_accept on;
}

http {
    # IO Optimization
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    
    # Keepalive to reduce CPU overhead on TCP handshakes
    keepalive_timeout 65;
    keepalive_requests 100000;

    # Buffer Optimization (Crucial for JSON payloads)
    client_body_buffer_size 128k;
    client_max_body_size 10m;
    client_header_buffer_size 1k;
    large_client_header_buffers 4 4k;
    output_buffers 1 32k;
    postpone_output 1460;

    # SSL Optimization (TLS 1.3 is mandatory in 2021)
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
    ssl_prefer_server_ciphers on;
    ssl_session_cache shared:SSL:50m;
    ssl_session_timeout 1d;
    ssl_session_tickets off;
}

Why Hosting Location Matters (Schrems II & Latency)

Technical tuning only goes so far. Physics dictates the rest. If your customers are in Oslo or Bergen, and your gateway is in Amsterdam, you are adding roughly 18-25ms of latency. That sounds negligible until you realize your API requires 4 round-trips to render a frontend component.

Furthermore, following the Schrems II ruling last year, relying on US-owned hyperscalers for handling European user data has become a legal minefield. The Norwegian Data Protection Authority (Datatilsynet) is scrutinizing data transfers more than ever.

Hosting locally isn't just about speed; it's about compliance and data sovereignty. We built CoolVDS infrastructure in Oslo specifically to address this. We peer directly at NIX (Norwegian Internet Exchange), meaning requests from Telenor or Telia fiber hit your server in single-digit milliseconds.

4. The Hardware Factor: NVMe vs. SATA

API Gateways log heavily. Access logs, error logs, and audit trails. On a high-traffic node, writing these logs to a standard SATA SSD can block I/O, causing the CPU to wait (iowait). This manifests as random latency spikes that are impossible to debug via application code.

Storage Type Read Speed (Approx) IOPS (Random 4k) Suitability
Standard HDD 150 MB/s ~100 Backups only
SATA SSD 550 MB/s ~80,000 Web Serving
CoolVDS NVMe 3,500 MB/s ~500,000+ High-Load API

We mandate NVMe for our high-performance tiers because disk I/O should never be the bottleneck. When you combine epoll with non-blocking I/O and NVMe hardware, you essentially remove the storage layer from the latency equation.

Final Thoughts

You can throw money at a Kubernetes cluster in a generic cloud and hope autoscaling saves you, or you can understand the stack. By tuning your kernel parameters and Nginx buffers, you can often serve 5x the traffic on the same hardware.

But software tuning cannot fix bad routing. If you are serious about the Nordic market, stop routing your traffic through Central Europe.

Need to test your latency? Deploy a CoolVDS instance in Oslo. It takes 55 seconds to spin up, and you'll see the ping difference immediately.