Stop Blaming the Code: Tuning API Gateways for Sub-Millisecond Latency

If I had a krone for every time a developer told me their API was slow because of "database locking" when the real culprit was a choked TCP stack at the gateway, I could buy a house in Aker Brygge. In the Nordic hosting market, where latency to the NIX (Norwegian Internet Exchange) in Oslo is measured in single-digit milliseconds, a poorly tuned API Gateway is a crime against efficiency.

We are seeing a massive shift in 2020. Monoliths are breaking down into microservices, and suddenly, you have Kong, Nginx, or HAProxy sitting in front of fifty internal services. The traffic volume hasn't just increased; the connection count has exploded. If you are running default configurations on a standard VPS, you are already hitting a wall.

Here is the brutal truth: You cannot tune a network stack on a container-based VPS (like OpenVZ or LXC). You share the kernel with neighbors. To implement the fixes below, you need a KVM-based architecture—like CoolVDS—where you have full authority over sysctl.conf.

1. The "Keepalive" Trap in Nginx

Most API gateways today are built on Nginx (this includes Kong). Out of the box, Nginx is conservative. It assumes you are serving static assets, not proxying thousands of JSON requests to upstream application servers.

The most common mistake? Failing to enable keepalive connections to the upstream. Without this, Nginx opens a new TCP handshake for every single request to your backend service. That is expensive. It adds latency and burns CPU cycles on TLS handshakes if you are encrypting internal traffic.

Here is how you fix it in your nginx.conf context:

http {
    upstream backend_api {
        server 10.0.0.5:8080;
        server 10.0.0.6:8080;

        # The Magic Number
        # Keeps 64 idle connections open per worker process
        keepalive 64;
    }

    server {
        location /api/ {
            proxy_pass http://backend_api;
            
            # essential for keepalive to work
            proxy_http_version 1.1;
            proxy_set_header Connection "";
        }
    }
}

By setting proxy_set_header Connection "";, you clear the "close" header that Nginx adds by default. This single change can drop internal latency from 15ms to 2ms in high-load environments.

2. Kernel Tuning: Crushing the TCP Bottlenecks

If you are pushing 10,000 requests per second (RPS), the Linux kernel defaults will choke. You will run out of ephemeral ports, or you will hit file descriptor limits. This is where the "Battle-Hardened" part comes in.

On a CoolVDS KVM instance running Ubuntu 20.04 or CentOS 8, you have direct access to modify kernel parameters. Do not try this on shared hosting; it won't work.

Pro Tip: Always back up your current sysctl settings before applying changes. One wrong move can drop all network traffic.

Edit /etc/sysctl.conf to optimize for high concurrency:

# Allow reuse of sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535

# Max number of packets in the receive queue
net.core.netdev_max_backlog = 16384

# Max number of connections in the listen queue
net.core.somaxconn = 8192

# Disable slow start after idle (crucial for sporadic API traffic)
net.ipv4.tcp_slow_start_after_idle = 0

# Increase max open files
fs.file-max = 500000

Apply these with sysctl -p. The tcp_slow_start_after_idle = 0 setting is particularly important for APIs. By default, Linux puts a connection to "sleep" regarding its congestion window if it's idle. For an API, you want that window wide open instantly.

3. The NVMe Factor: Logging is the Silent Killer

API Gateways are chatty. They log access requests, errors, and often audit trails for GDPR compliance. If you are writing 5,000 lines of logs per second to a standard SATA SSD—or worse, a spinning HDD—your I/O Wait (iowait) will skyrocket. When the disk blocks, Nginx blocks.

We benchmarked this extensively. A standard SSD array often caps out at 500-600 MB/s read/write with significantly higher latency under random write loads (like logs). NVMe drives, which bypass the SATA controller and sit directly on the PCIe bus, can handle queue depths that obliterate standard SSDs.

Metric	SATA SSD VPS	CoolVDS NVMe
Random Write IOPS	~5,000	~250,000+
Disk Latency	1-2 ms	0.05 ms
Effect on Nginx	Blocking under heavy logging	Zero perceptible delay

If your logs are slowing down your production traffic, you don't need better code. You need faster I/O.

4. Local Nuances: The Norwegian Advantage

Why does location matter? Physics. Light travels fast, but fiber optics are not straight lines. If your users are in Oslo, Bergen, or Trondheim, and your API gateway is hosted in Frankfurt or Amsterdam, you are adding a 20-30ms round-trip penalty before the request even hits your server.

Furthermore, with the Datatilsynet (Norwegian Data Protection Authority) keeping a close watch on GDPR compliance, keeping data on Norwegian soil is becoming less of a "nice to have" and more of a legal safeguard. CoolVDS infrastructure is located locally, ensuring that your latency to the NIX is negligible and your data sovereignty is clear.

5. Worker Rlimit Configuration

Finally, Nginx has its own limits separate from the OS. Even if you tuned fs.file-max in the kernel, Nginx won't use it unless told to.

In the main context of nginx.conf, ensure you raise the worker file limits:

worker_processes auto;

# Must be higher than worker_connections
worker_rlimit_nofile 65535;

events {
    # Efficient connection processing method for Linux
    use epoll;
    
    # Max connections per worker
    worker_connections 16384;
    
    # Accept as many connections as possible
    multi_accept on;
}

Summary

Performance isn't magic. It's the sum of a thousand small optimizations. But those optimizations require a foundation that allows them to happen. You cannot tune a kernel you don't own, and you cannot write logs faster than your disk allows.

For serious API hosting in 2020, the requirements are clear: KVM virtualization, NVMe storage, and a local footprint. Anything less is just asking for a timeout.

Ready to drop your latency? Deploy a high-performance NVMe KVM instance on CoolVDS today and get full root access to tune your stack properly.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

Stop Blaming the Code: Tuning API Gateways for Sub-Millisecond Latency (2020 Edition)

Stop Blaming the Code: Tuning API Gateways for Sub-Millisecond Latency

1. The "Keepalive" Trap in Nginx

2. Kernel Tuning: Crushing the TCP Bottlenecks

3. The NVMe Factor: Logging is the Silent Killer

4. Local Nuances: The Norwegian Advantage

5. Worker Rlimit Configuration

Summary

/// RELATED POSTS

Edge Computing in Norway: Architecting for Sub-5ms Latency in 2025

Kubernetes Networking Deep Dive: Optimizing Packet Flow for Low Latency in 2025

Surviving the Packet Storm: A Deep Dive into Kubernetes Networking & CNI Performance in 2025

Surviving the Millisecond War: Edge Computing Architectures for the Nordic Market

Kubernetes Networking Deep Dive: Why Your Packets Are Dropping in the Overlay

Serverless Without the Handcuffs: Implementing Private FaaS Patterns on High-Performance VDS in 2025