Console Login

API Gateway Performance Tuning: Breaking the 10ms Barrier

API Gateway Performance Tuning: Breaking the 10ms Barrier

I still remember the silence of the logs during a Black Friday sale three years ago. The traffic graph didn't dip; it flatlined. Our backend services were idling at 2% CPU, yet the frontend was throwing 502s like confetti. The culprit wasn't code—it was the default Linux TCP stack choking on ephemeral ports.

Most developers treat the API Gateway (usually Nginx, Kong, or Traefik) as a black box. You apt-get install it, point it at an upstream, and hope for the best. That strategy works for your personal blog. It fails catastrophically when you hit 5,000 requests per second (RPS).

In the Nordic hosting market, where we pride ourselves on infrastructure stability, relying on defaults is professional negligence. Let's fix your gateway.

1. The Kernel is Your Bottleneck

Before touching the web server configuration, look at the OS. By default, Linux is tuned for a modest desktop experience, not for handling thousands of concurrent connections. The biggest enemy of high-performance API gateways is the TIME_WAIT state.

When your gateway closes a connection to an upstream service, the socket sits in TIME_WAIT for 60 seconds (by default) to ensure all packets are received. Under high load, you will run out of available local ports, effectively preventing new connections.

Add these lines to your /etc/sysctl.conf to widen the pipe:

# Allow reuse of sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535

# Max number of packets that can be queued on interface input
net.core.netdev_max_backlog = 16384

# Max connection queue length
net.core.somaxconn = 8192

# Max number of file-handles that the Linux kernel will allocate
fs.file-max = 500000

Apply them with sysctl -p. These settings ensure that your CoolVDS instance doesn't artificially throttle traffic just because it's being polite to old TCP packets.

2. Nginx: The "Keepalive" Mistake

This is the single most common misconfiguration I see in audits. By default, Nginx uses HTTP/1.0 for connections to upstream servers. This means it opens a new connection, sends the request, receives the response, and closes the connection. For every single request.

The overhead of the TCP handshake and SSL negotiation (if you encrypt internal traffic) adds significant latency—often 20-50ms per request.

You must force Nginx to keep connections open to the backend:

upstream backend_api {
    server 10.0.0.5:8080;
    server 10.0.0.6:8080;
    
    # Keep 64 idle connections open to the upstream
    keepalive 64;
}

server {
    location /api/ {
        proxy_pass http://backend_api;
        
        # Required for keepalive to work
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}
Pro Tip: The keepalive directive defines the number of idle keepalive connections to upstream servers, not the total number of connections. Set this number high enough to handle your average concurrency, but low enough not to starve the backend service threads. 64 or 128 is a good starting point for mid-sized deployments.

3. File Descriptors and Worker Limits

Nginx needs a file descriptor for every connection it handles. If you have 1,024 connections coming in and 1,024 going out to the backend, you are consuming 2,048 descriptors. The default limit is often 1,024.

You need to raise this at the Nginx process level. In your main nginx.conf:

worker_processes auto;

# This number should be > worker_connections * 2
worker_rlimit_nofile 65535;

events {
    # Determines how many clients will be served per worker
    worker_connections 16384;
    
    # Essential for Linux performance
    use epoll;
    multi_accept on;
}

4. Buffering: Disk I/O is the Latency Killer

If the response from your backend is larger than the internal Nginx buffers, Nginx will write the partial response to a temporary file on the disk. Even with fast storage, disk I/O is orders of magnitude slower than RAM.

When hosting on CoolVDS, you get access to high-performance NVMe storage which mitigates this pain significantly compared to standard SATA SSDs found in budget VPS providers. However, avoiding the disk entirely is always superior.

Tune your buffer sizes to fit your average JSON payload:

http {
    # ... other settings ...
    
    # Size of the buffer used for reading the first part of the response
    proxy_buffer_size 16k;
    
    # Number and size of the buffers for a single connection
    proxy_buffers 8 16k;
    
    # Disables writing to temporary files for responses under this size
    proxy_busy_buffers_size 24k;
}

5. The Infrastructure Factor: Why "Steal Time" Matters

You can apply every optimization listed above, but if your underlying host is oversold, your p99 latency will suffer. In a virtualized environment, "Steal Time" (st) is the percentage of time a virtual CPU waits for a real CPU to attend to the hypervisor.

If you are running a high-frequency trading bot or a real-time bidding API, even 2% steal time is unacceptable. It causes micro-stutters that don't show up in average monitoring but ruin the user experience.

Feature Standard Budget VPS CoolVDS Architecture
Virtualization Container-based (LXC/OpenVZ) - often noisy KVM - Hardware isolation
Storage Path Networked Storage (Ceph/SAN) - added latency Local NVMe - direct PCIe access
Resource Guarantees Shared/Burstable Dedicated Resources Available

Local Context: The Nordic Edge

For those of us deploying in Norway, network topology plays a massive role. If your primary user base is in Oslo or Bergen, hosting your API gateway in Frankfurt adds an unavoidable 20-30ms round-trip time (RTT). Physics is stubborn.

By placing your CoolVDS instance locally, utilizing the Norwegian Internet Exchange (NIX), you drop that network latency to under 5ms. Combine that with the kernel tuning above, and your API feels instantaneous. Furthermore, keeping data within Norwegian borders simplifies GDPR compliance and adheres to Datatilsynet recommendations regarding data sovereignty.

Final Verdict

Performance isn't about one magic switch. It's the sum of a tuned kernel, a properly configured application, and honest infrastructure. Don't let your code wait on a sluggish hypervisor.

Ready to test your tuned config? Spin up a CoolVDS instance with pure NVMe storage in Norway. You have 55 seconds to deploy.