Crushing API Latency: Gateway Tuning from the Oslo Trenches

If your API response time drifts consistently above 200ms, you aren't just annoying developers—you are bleeding revenue. I saw this happen just last week. A fintech client based here in Oslo was routing domestic payment traffic through a cloud provider in Frankfurt. The round-trip time (RTT) alone was eating 30ms, but the real killer was the jitter. One request took 45ms; the next took 400ms.

Why? Noisy neighbors and untuned gateways.

When you are building an API gateway—whether you are rolling raw Nginx, OpenResty, or Kong—the default configurations are designed for compatibility, not high-performance throughput. If you leave them stock, you are effectively driving a Ferrari in first gear. Below is the exact tuning playbook we used to drop their p99 latency to under 50ms, keeping traffic local and legally compliant with Datatilsynet requirements.

1. The OS Layer: Stop Choking on Connections

Before you even touch your gateway software, look at the Linux kernel. By default, most distributions are conservative with file descriptors and backlog queues. For an API gateway handling thousands of concurrent connections, these defaults will cause connection reset by peer errors during traffic spikes.

You need to open the floodgates. Edit your /etc/sysctl.conf to handle a higher rate of incoming TCP packets.

# /etc/sysctl.conf - Optimized for High Concurrency (Jan 2020 Standard)

# Increase system-wide file descriptors
fs.file-max = 2097152

# Increase the size of the receive queue
net.core.netdev_max_backlog = 65536

# Increase the maximum number of connections in the wait bucket
net.core.somaxconn = 65536

# Increase available ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535

# Reuse sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Keepalive settings to drop dead connections faster
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 6

After saving, run sysctl -p to apply. Do not blindly copy-paste; understand that tcp_tw_reuse is generally safe for outgoing connections (upstream to your microservices), but be cautious if you are behind a strict NAT.

Pro Tip: Check your `ulimit`. Even if the kernel allows millions of open files, your shell session might limit them to 1024. Add * hard nofile 1048576 to /etc/security/limits.conf.

2. Nginx Configuration: The Engine Room

Most gateways today are Nginx under the hood. The single most common mistake I see is neglecting the `worker_connections` and upstream keepalives. If Nginx has to open a new TCP handshake for every single request to your backend service, your CPU usage will spike due to SSL overhead and ephemeral port exhaustion.

Here is a snippet from a production nginx.conf optimized for a gateway role:

worker_processes auto;
worker_rlimit_nofile 1048576;

events {
    use epoll;
    worker_connections 65536;
    multi_accept on;
}

http {
    # ... logs and mime types ...

    # OPTIMIZATION: Buffer Sizes
    client_body_buffer_size 128k;
    client_max_body_size 10m;
    client_header_buffer_size 1k;
    large_client_header_buffers 4 4k;
    output_buffers 1 32k;
    postpone_output 1460;

    # OPTIMIZATION: Keepalive to Upstream
    upstream backend_service {
        server 10.0.0.5:8080;
        keepalive 64; # Keep 64 idle connections open
    }

    server {
        listen 443 ssl http2;
        server_name api.example.no;

        # SSL optimizations (OpenSSL 1.1.1+)
        ssl_protocols TLSv1.2 TLSv1.3;
        ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
        ssl_prefer_server_ciphers on;
        ssl_session_cache shared:SSL:10m;
        ssl_session_timeout 10m;

        location / {
            proxy_pass http://backend_service;
            proxy_http_version 1.1;
            proxy_set_header Connection ""; # Clear the 'close' header
            proxy_set_header Host $host;
        }
    }
}

The proxy_http_version 1.1 and clearing the Connection header are critical. Without them, Nginx defaults to HTTP/1.0 for upstreams and closes the connection after every request, negating any performance gains.

3. The Hardware Reality: Why "Cloud" Often Fails

You can have the most perfectly tuned kernel in Norway, but if your underlying hypervisor is stealing CPU cycles, it means nothing. In a shared hosting environment (oversold VPS), "Steal Time" occurs when the physical CPU is busy serving another tenant. For an API Gateway, this manifests as micro-stutters. Your logs show the request took 5ms, but the client experienced 200ms.

This is where the architecture of CoolVDS becomes the reference implementation for serious work. We rely on KVM (Kernel-based Virtual Machine) which provides strict hardware isolation.

Comparison: Shared Containers vs. CoolVDS KVM

Feature	Standard Container VPS	CoolVDS KVM Instance
CPU Scheduler	Shared, prone to "Steal Time"	Dedicated Virtual Cores
Disk I/O	Often standard SSD or HDD	NVMe Storage (High IOPS)
Kernel Access	Restricted (Shared Kernel)	Full Control (Tune sysctl freely)

If you are running a database alongside your gateway (like PostgreSQL for Kong), disk latency is the next bottleneck. NVMe storage is not a luxury anymore; it is a requirement. Standard SSDs choke under the random write patterns of high-velocity logging. On CoolVDS, the NVMe interfaces talk directly to the PCIe bus, bypassing the legacy SATA bottlenecks.

4. Local Nuances: The Norwegian Advantage

Data sovereignty is no longer just a legal checklist item; it is a technical advantage. By hosting on VPS Norway infrastructure, your packets hit the NIX (Norwegian Internet Exchange) immediately. If your users are in Oslo, Bergen, or Trondheim, routing traffic to a massive data center in Ireland or Germany adds physical distance and unnecessary hops.

Furthermore, with GDPR and the recent focus on data export limits, keeping your API logs and user data physically within Norwegian borders simplifies your compliance stance with Datatilsynet. You do not need complex legal frameworks for data transfer if the data never leaves the country.

5. Monitoring the Win

After applying these changes, how do you verify? Don't just look at averages. Look at the outliers.

# Install simple load testing tool (hey or wrk)
# Using 'hey' for a quick check (Go-based tool available in 2020)

hey -n 10000 -c 100 https://api.your-coolvds-host.no/

You want to see a tight distribution. If your 99% percentile is close to your average, you have successfully eliminated the jitter. If you still see spikes, check your dmesg for OOM kills or conntrack limits.

Final Thoughts

Performance tuning is an iterative process. Start with the kernel, move to the application config, and never underestimate the impact of the hardware underneath. If you are tired of fighting for CPU cycles on oversold platforms, it is time to upgrade.

Don't let slow I/O kill your SEO or your user experience. Deploy a test instance on CoolVDS today and see the difference dedicated NVMe resources make to your API latency.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

Crushing API Latency: Gateway Tuning from the Oslo Trenches

Crushing API Latency: Gateway Tuning from the Oslo Trenches

1. The OS Layer: Stop Choking on Connections

2. Nginx Configuration: The Engine Room

3. The Hardware Reality: Why "Cloud" Often Fails

Comparison: Shared Containers vs. CoolVDS KVM

4. Local Nuances: The Norwegian Advantage

5. Monitoring the Win

Final Thoughts

/// RELATED POSTS

Edge Computing in Norway: Architecting for Sub-5ms Latency in 2025

Kubernetes Networking Deep Dive: Optimizing Packet Flow for Low Latency in 2025

Surviving the Packet Storm: A Deep Dive into Kubernetes Networking & CNI Performance in 2025

Surviving the Millisecond War: Edge Computing Architectures for the Nordic Market

Kubernetes Networking Deep Dive: Why Your Packets Are Dropping in the Overlay

Serverless Without the Handcuffs: Implementing Private FaaS Patterns on High-Performance VDS in 2025