Console Login

API Gateway Tuning: Squeezing Milliseconds on Linux (2018 Edition)

The Bottleneck is (Probably) Not Your Microservice

It is July 2018. GDPR has been live for two months, and the panic has settled into a dull headache. You have broken your monolith into microservices, containerized them with Docker, and slapped an API Gateway in front. Now, you are staring at Grafana. The 99th percentile latency just spiked to 400ms. Your Norwegian fintech client is screaming about "trege systemer" (slow systems).

Most DevOps engineers instinctively scale out. They spin up more droplets, more instances. That is lazy. Often, the problem isn't lack of CPUβ€”it is a choked network stack or I/O wait. I have debugged clusters where the application code was blazing fast, but the gateway was dropping packets because of default Linux settings from 2010.

If you are running high-throughput APIs on standard VPS hosting in Europe, you need to get your hands dirty with kernel flags and NGINX configs. Here is how we tune the stack at CoolVDS to handle the load without melting down.

1. The File Descriptor Trap

Linux treats everything as a file. A TCP connection is a file. The default limit for open files on many distros (like Ubuntu 16.04 or 18.04) is often set to 1024. That is laughable for an API gateway.

When you hit connection #1025, your gateway doesn't slow down; it crashes or rejects connections. You will see Too many open files in your logs. Fix this first.

Check your current limits:

ulimit -n

If it returns 1024, you have work to do. Edit /etc/security/limits.conf to raise the ceiling for your web user (usually www-data or nginx):

* soft nofile 65535
* hard nofile 65535
root soft nofile 65535
root hard nofile 65535

Pro Tip: Setting this in the OS isn't enough if you use NGINX. You must explicitly tell NGINX to use these descriptors.

2. NGINX: The Worker Configuration

I see this mistake constantly. Engineers leave nginx.conf on default settings. NGINX is an event-based server, meaning it doesn't spawn a thread per connection (unlike Apache). However, it is constrained by worker_connections.

Here is the reference configuration we deploy on CoolVDS high-performance instances:

user www-data;
worker_processes auto; # Automatically detects your CPU cores
pid /run/nginx.pid;

# This is crucial. Must match or exceed ulimit -n
worker_rlimit_nofile 65535; 

events {
    # Determines how many connections one worker can handle.
    # Total max connections = worker_processes * worker_connections
    worker_connections 16384;
    
    # efficient connection processing method for Linux
    use epoll;
    
    # Accept as many connections as possible, immediately
    multi_accept on;
}

Without multi_accept on, a worker will accept one new connection at a time. Under a DDoS attack or a marketing push, you want that worker grabbing connections as fast as the kernel hands them over.

3. Kernel Tuning: The Sysctl Layer

The Linux network stack was designed for reliability over WANs, not for massive throughput between microservices inside a datacenter. We need to tweak the TCP stack. This is done in /etc/sysctl.conf.

A common issue is running out of ephemeral ports. When your gateway connects to an upstream service, it opens a local port. If you churn through connections too fast, you hit the limit (usually ~28,000 ports) and get TIME_WAIT exhaustion.

Apply these settings to widen the highway:

# Allow reusing sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535

# Max number of packets in the receive queue
net.core.netdev_max_backlog = 16384

# Max number of connections queued in the kernel
net.core.somaxconn = 8192

# Increase TCP buffer sizes for 10Gbps links (common in 2018 datacenters)
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

Load them instantly:

sysctl -p
Warning: Do not enable net.ipv4.tcp_tw_recycle. It was deprecated in recent Linux kernels and breaks connections coming from behind NAT devices (like mobile phones on 4G networks). Stick to tcp_tw_reuse.

4. The Hardware Reality: NVMe vs. Spinning Rust

You can tune software all day, but if your disk I/O is slow, your logs will block your request processing. NGINX writes access logs and error logs. If you are logging every API request to a standard HDD (Hard Disk Drive), the disk head seeks will throttle your throughput.

In 2018, SSDs are standard, but NVMe (Non-Volatile Memory Express) is the differentiator. NVMe connects directly via the PCIe bus, bypassing the SATA bottleneck.

Benchmark Comparison (Random Read/Write)

Storage Type IOPS (Approx) Latency
Standard HDD (7200 RPM) 80 - 120 ~15 ms
SATA SSD 5,000 - 80,000 ~0.2 ms
CoolVDS NVMe 300,000+ ~0.03 ms

If you are aggregating logs or using a local database cache on the gateway, NVMe is mandatory. Check your disk wait time with iostat:

iostat -x 1

If %iowait is consistently above 5%, your storage is the bottleneck. At CoolVDS, our infrastructure is built purely on enterprise NVMe arrays to eliminate this variable entirely.

5. Upstream Keepalives: The Silent Killer

By default, NGINX acts as a polite browser: it opens a connection to your backend service, sends the request, gets the reply, and closes the connection. This requires a full TCP handshake (SYN, SYN-ACK, ACK) for every single API call.

For an API gateway handling 5,000 req/sec, that is 15,000 unnecessary packets per second just for handshakes. It destroys latency.

Enable keepalives to your upstreams:

upstream backend_api {
    server 10.0.0.5:8080;
    server 10.0.0.6:8080;
    
    # Keep 64 idle connections open per worker
    keepalive 64;
}

server {
    location /api/ {
        proxy_pass http://backend_api;
        
        # Required for keepalive to work
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}

This simple change reduced latency by 35ms in a recent project involving a booking system connected to the NIX (Norwegian Internet Exchange).

6. Local Nuances: Norway, GDPR, and Latency

Hosting in Norway offers unique advantages in 2018. Since the implementation of GDPR in May, data sovereignty is critical. While Norway is not in the EU, it is in the EEA, making it fully GDPR compliant. However, physical proximity matters.

If your users are in Oslo, Stavanger, or Bergen, routing traffic through Frankfurt or Amsterdam adds 20-30ms of round-trip time. Light speed is finite.

Test your latency to your current provider:

curl -w "Connect: %{time_connect} TTFB: %{time_starttransfer} Total: %{time_total}\n" -o /dev/null -s https://your-api.com

If your time_connect is over 0.050s (50ms) from a local Norwegian connection, you are losing conversions. CoolVDS leverages direct peering at NIX to ensure packets stay within the country whenever possible.

Conclusion

Performance isn't magic; it's physics and configuration. By raising file descriptors, tuning kernel TCP parameters, and utilizing persistent connections, you can double the throughput of your API gateway without spending an extra Krone on hardware.

However, software tuning hits a wall if the underlying virtualization is noisy or the disk is slow. Don't let IO wait kill your SEO.

Ready to see the difference NVMe makes? Deploy a high-performance instance on CoolVDS in under 55 seconds and benchmark it yourself.