Console Login

API Gateway Performance Tuning: Squeezing the Linux Kernel for Millisecond Latency (2020 Edition)

Stop Letting Default Configs Kill Your API Performance

It is October 2020. If you are still running your API Gateway on a standard installation of Nginx or HAProxy without touching `sysctl.conf`, you are essentially driving a Ferrari in first gear. I recently audited a payment processing API hosted in a generic cloud region. The developers were blaming their Python code for 500ms latency spikes. They were wrong. The Python application was fine; their gateway was choking on SSL handshakes and TCP backlog queues.

In the high-stakes environment of Nordic tech—where latency to Oslo is scrutinized and the recent Schrems II ruling (July 2020) has made data sovereignty a legal minefield—efficiency isn't just about speed. It's about survival. You cannot achieve 99th percentile consistency on over-sold hardware or unoptimized kernels.

The Hardware Reality: Why Virtualization Type Matters

Before we touch a single line of code, let's address the infrastructure. You can tune your software until you are blue in the face, but if your underlying host is stealing CPU cycles, it is futile.

Many "budget" VPS providers use container-based virtualization (like OpenVZ or LXC) where the kernel is shared. This prevents you from modifying deep TCP stack parameters. Furthermore, you suffer from "noisy neighbors"—other users on the physical host consuming I/O bandwidth.

Pro Tip: Always check your Steal Time. Run top and look for the st value. If it is consistently above 0.0, your provider is overselling CPU. This is why we built CoolVDS strictly on KVM architecture. You get a dedicated kernel and reserved resources, ensuring your tuning actually works.

Layer 1: The Linux Kernel Tuning

A default Linux distribution is tuned for a general-purpose desktop, not a high-throughput API gateway handling thousands of concurrent connections. We need to widen the TCP pipes.

Open your /etc/sysctl.conf. We are going to modify how the kernel handles connections.

1. Increasing the Backlog

When an API request hits your server, it sits in a queue waiting to be accepted. If this queue fills up, packets are dropped. Nginx defaults often mismatch the OS defaults.

# /etc/sysctl.conf # Maximize the number of open file descriptors fs.file-max = 2097152 # Increase the size of the receive queue net.core.netdev_max_backlog = 16384 # Increase the maximum number of connections waiting for acceptance net.core.somaxconn = 65535 # Increase the maximum number of remembered connection requests net.ipv4.tcp_max_syn_backlog = 262144

2. Ephemeral Ports and Timeouts

High-traffic gateways run out of ephemeral ports quickly because connections sit in TIME_WAIT state. We need to recycle them faster.

# Range of ports to use for outgoing connections net.ipv4.ip_local_port_range = 1024 65535 # Enable fast recycling of TIME_WAIT sockets net.ipv4.tcp_tw_reuse = 1 # Decrease the time a socket stays in FIN-WAIT-2 net.ipv4.tcp_fin_timeout = 15

Apply these changes with sysctl -p.

Layer 2: Nginx / OpenResty Configuration

Whether you are using raw Nginx or an API Gateway wrapper like Kong (which runs on OpenResty), the nginx.conf is your control center. The default worker_connections 1024; is a joke for production workloads.

Worker Processes and File Descriptors

First, ensure Nginx can actually use the file descriptors we enabled in the kernel.

worker_processes auto;

# This directive is crucial. It must exceed worker_connections.
worker_rlimit_nofile 65535;

events {
    # Use epoll for Linux kernels
    use epoll;
    
    # Allow a worker to accept all new connections at once
    multi_accept on;
    
    # Max concurrent connections per worker
    worker_connections 16384;
}

Upstream Keepalive

A common mistake is failing to keep connections open to your upstream service (the actual API backend). Without keepalive, Nginx opens a new TCP handshake for every single request it proxies. This adds significant latency, especially if your backend is on a different node.

http {
    upstream backend_api {
        server 10.0.0.5:8080;
        
        # Keep 64 idle connections alive to the upstream
        keepalive 64;
    }

    server {
        location /api/ {
            proxy_pass http://backend_api;
            
            # These headers are required for keepalive to work
            proxy_http_version 1.1;
            proxy_set_header Connection "";
        }
    }
}

Layer 3: TLS 1.3 and Crypto Acceleration

It is late 2020. TLS 1.2 is the bare minimum, but TLS 1.3 is the standard for performance. TLS 1.3 reduces the handshake from two round-trips to one (1-RTT). For mobile clients on spotty 4G networks in rural Norway, this reduction is noticeable.

Ensure you are running OpenSSL 1.1.1+ and configure your cipher suites correctly. Prioritize ECDHE (Elliptic Curve Diffie-Hellman).

Protocol Handshake Latency Security
TLS 1.2 ~2 Round Trips Good (if configured right)
TLS 1.3 ~1 Round Trip Excellent

The Buffer Sizing Trade-off

SSL consumes memory. If you are logging extensively or buffering large request bodies, disk I/O becomes your bottleneck. This is where the hardware argument returns.

Spinning rust (HDD) cannot handle the random write patterns of high-volume API logging. On CoolVDS, we deploy standard NVMe storage. NVMe queues are designed for parallelism, matching modern multi-core CPUs.

# Optimization for SSL Buffer
ssl_buffer_size 4k; # Lower buffer size for lower latency (default is 16k)

# OCSP Stapling - Saves the client a DNS lookup
ssl_stapling on;
ssl_stapling_verify on;
resolver 1.1.1.1 8.8.8.8 valid=300s;
resolver_timeout 5s;

Compliance & Geography: The Hidden Latency

Performance isn't just code; it's physics. Light travels at a finite speed. If your users are in Oslo and your API gateway is in Frankfurt, you are adding ~20-30ms of round-trip time purely due to distance.

More importantly, with the Schrems II judgment invalidating the Privacy Shield, hosting personal data of Norwegian citizens on US-owned cloud infrastructure is legally risky. Hosting locally in Norway or strict EU jurisdictions reduces legal exposure and latency simultaneously.

Testing Your Tuning

Don't guess. Benchmark. Use wrk to load test your gateway locally.

# Install wrk (requires build tools) git clone https://github.com/wg/wrk.git cd wrk && make # Run a test: 12 threads, 400 connections, for 30 seconds ./wrk -t12 -c400 -d30s https://your-api.com/endpoint

If you see timeouts, check your dmesg on the server. If you see "possible SYN flooding on port", your backlog is still too small. If your CPU is 100% but I/O wait is high, you need faster disk access.

Final Thoughts

Optimizing an API gateway is an exercise in removing bottlenecks one by one until the only limit is the speed of light. You start with the kernel, move to the Nginx config, and optimize the encryption.

But remember: software cannot fix bad hardware. If you require consistent I/O performance for logging and dedicated CPU cycles for SSL termination, shared hosting will fail you. Deploy a test instance on CoolVDS today—our NVMe-backed KVM slices are built specifically for engineers who read man pages.