Console Login

Squeezing the Kernel: Advanced API Gateway Tuning for < 10ms Latency

Stop Letting Default Configs Throttle Your Traffic

If you represent your API response times in seconds, stop reading. You have bigger architectural problems. This guide is for the engineers fighting for milliseconds. The ones who know that a 200ms delay in a microservices chain compounds into a user-facing disaster. I recently audited a payment gateway setup for a client in Oslo. Their application code was optimized Go, but their throughput was abysmal. The culprit? A stock NGINX configuration sitting on a bloated "cloud" instance with noisy neighbors.

Latency isn't just a metric; it's a ceiling on your revenue. In the Nordic market, where mobile penetration is near-total, users expect instant interactions. Here is how we rip out the bottlenecks, starting from the OS kernel up to the application layer.

1. The Hardware Lie: Why "vCPU" is meaningless

Most VPS providers in Europe oversell their CPU cores. You might see "4 vCPUs" in your dashboard, but what you actually have is a sliver of time on a congested hypervisor. When your API Gateway (be it Kong, Tyk, or raw NGINX) tries to handle a burst of SSL handshakes, it has to wait for the physical CPU to pay attention. This is CPU Steal.

Run top. Look at the %st (steal) value. If it is consistently above 0.0, your provider is the bottleneck. No amount of code optimization will fix that.

This is why we architect CoolVDS on KVM with strict resource isolation. We don't oversubscribe. When you hit an API endpoint hosted on our NVMe instances, the CPU cycles are yours. The difference in TLS termination speed alone is palpable.

2. Tuning the Linux Kernel for High Concurrency

By default, Linux is tuned for general desktop use, not for handling 50,000 concurrent connections. We need to open up the TCP stack. Specifically, we need to handle the ephemeral port exhaustion and increase the backlog for incoming connections.

Edit your /etc/sysctl.conf. These settings were battle-tested on Ubuntu 18.04 LTS (kernel 4.15+).

# Allow reusing sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Increase the maximum number of open files (essential for high concurrency)
fs.file-max = 2097152

# Increase the read/write buffer sizes for TCP
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# Increase the backlog of SYN requests (essential for preventing dropped connections during bursts)
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535

Apply these with sysctl -p. Without this, your API Gateway will drop packets silently during traffic spikes, leading to those mysterious "timeout" errors your frontend team keeps complaining about.

3. NGINX: Beyond the Defaults

Whether you use bare metal NGINX or an abstraction like Kong, the underlying engine needs tuning. The standard nginx.conf is too conservative.

Worker Processes & Connections

Auto-detection usually works, but explicit control is better for dedicated gateways. We want to bind workers to cores to prevent context switching.

worker_processes auto;
worker_rlimit_nofile 100000;

events {
    worker_connections 4096;
    use epoll;
    multi_accept on;
}

Keepalive Connections

Opening a new TCP connection for every request is expensive. Opening a new SSL/TLS session is even worse. You must enforce keepalives to the upstream services.

upstream backend_api {
    server 10.0.0.5:8080;
    # Keep 64 idle connections open to the backend
    keepalive 64;
}

server {
    location /api/ {
        proxy_pass http://backend_api;
        # Required for keepalive to work
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}

4. The I/O Bottleneck: Access Logging

Here is a harsh truth: writing logs to disk is one of the most expensive operations your gateway performs. On a standard HDD (spinning rust) VPS, synchronous logging can block your worker threads.

We saw a client's API latency drop from 150ms to 40ms simply by introducing buffering to their access logs. Even better, use NVMe storage (standard on CoolVDS) so disk writes don't become a queue.

# Buffer logs in memory before writing to disk
access_log /var/log/nginx/access.log main buffer=32k flush=1m;

5. SSL/TLS: The Encryption Tax

In 2020, there is no excuse for not using TLS 1.3. It reduces the handshake overhead significantly (1-RTT vs 2-RTT). However, generating session keys requires entropy and CPU.

If you are serving traffic to Norway and Northern Europe, ensure your certificates are using ECDSA (Elliptic Curve) rather than RSA. ECDSA keys are smaller and faster to compute, meaning your gateway can handle more handshakes per second per watt of power.

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers EECDH+AESGCM:EDH+AESGCM;
ssl_ecdh_curve secp384r1;
ssl_session_cache shared:SSL:50m;
ssl_session_timeout 1d;
ssl_session_tickets off;

Geography Matters: The NIX Advantage

Physics is stubborn. The speed of light is finite. If your users are in Oslo and your servers are in Frankfurt, you are adding 15-20ms of round-trip latency (RTT) before your server even processes the first packet. Hosting your API Gateway in Norway, close to the NIX (Norwegian Internet Exchange), keeps that physical latency under 2ms.

Furthermore, Datatilsynet (The Norwegian Data Protection Authority) is becoming increasingly strict regarding GDPR. Keeping your data—and your encryption keys—on Norwegian soil is the safest play for compliance in 2020.

Summary

Optimization Layer Action Impact
Hardware Move to NVMe + KVM Eliminates I/O wait & CPU steal
Kernel Increase somaxconn & file-max Prevents dropped packets under load
Application Enable Keepalives & TLS 1.3 Reduces handshake overhead by 50%

Performance isn't an accident. It's engineering. You can apply all these configs today, but if your underlying infrastructure is fighting you, you will lose.

Don't let slow I/O kill your SEO or your user experience. Deploy a test instance on CoolVDS today and benchmark the difference yourself.