Console Login

Zero-Latency or Bust: Advanced API Gateway Tuning for Nordic Microservices

Zero-Latency or Bust: Advanced API Gateway Tuning for Nordic Microservices

If your API Gateway is adding more than 10 milliseconds of overhead to a request, you aren't managing traffic—you're choking it. In the era of microservices, where a single user action might trigger fan-out calls to five different backend services, gateway latency compounds mathematically. A "minor" delay at the edge becomes a catastrophic bottleneck for the user.

I've spent the last decade debugging high-traffic clusters, from e-commerce platforms bracing for Black Friday to fintech APIs handling real-time transactions. The pattern is always the same: developers write performant code, push it to production, and then watch it crawl because the gateway (be it NGINX, Kong, or HAProxy) is running on default settings on a starved VPS.

Furthermore, we are operating in July 2021. The legal landscape has shifted under our feet. Since the Schrems II ruling last year, relying on US-based hypescalers for your edge routing is a compliance minefield. If you are handling Norwegian user data, terminating TLS in Frankfurt on a US-owned cloud is no longer just a technical decision; it's a liability.

This guide cuts through the vendor noise. We are going to tune the Linux kernel, optimize the NGINX worker model, and discuss why the underlying hardware architecture (specifically NVMe and KVM) dictates your ceiling.

The Kernel is the Limit: Sysctl Tuning

Most Linux distributions, including Ubuntu 20.04 LTS, ship with generic kernel settings designed for desktop usage or light web serving. They are not tuned for an API Gateway handling 10,000 concurrent connections. When you hit a traffic spike, the OS will start dropping packets long before your CPU maxes out.

The most common killer is the ephemeral port exhaustion and the TCP backlog queue. Here is the sysctl.conf baseline I deploy on every gateway node.

Key Kernel Directives

# /etc/sysctl.conf

# Increase system-wide file descriptor limit
fs.file-max = 2097152

# Widen the local port range to allow more upstream connections
net.ipv4.ip_local_port_range = 1024 65535

# Enable reusing sockets in TIME_WAIT state for new connections
# Critical for high-throughput API gateways talking to backends
net.ipv4.tcp_tw_reuse = 1

# Increase the maximum number of connections in the backlog
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535

# Tweaking the read/write buffers for TCP (adjust based on RAM)
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864

# Protect against SYN flood attacks without killing legit traffic too early
net.ipv4.tcp_syncookies = 1

Apply these changes with sysctl -p. Without tcp_tw_reuse, your gateway will eat up all available sockets waiting for old connections to close, resulting in the dreaded Nginx: 99: Cannot assign requested address error in your error logs.

Nginx Configuration: Beyond the Basics

Whether you use raw NGINX or a derivative like Kong, the core logic remains. The default configuration usually disables keepalives to upstream servers. This forces the gateway to open a new TCP handshake (and often a TLS handshake) for every single request to your backend microservices. This is inefficient and slow.

The Upstream Keepalive Fix

You must configure the upstream block to keep connections open. This reduces CPU usage (less context switching) and latency.

upstream backend_service_1 {
    server 10.0.0.5:8080;
    server 10.0.0.6:8080;

    # Keep 64 idle connections open to the backend
    keepalive 64;
}

server {
    listen 443 ssl http2;
    server_name api.example.no;

    # SSL optimizations for 2021 standards
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
    ssl_prefer_server_ciphers on;

    location /v1/service/ {
        proxy_pass http://backend_service_1;
        
        # Required to enable keepalive
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        
        # Pass real IP to backend (crucial for audit logs)
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}
Pro Tip: Monitor your "Open File" usage. Even with kernel limits raised, NGINX has its own limits. Check the worker limit in the main context: worker_rlimit_nofile 65535;. This number must be larger than worker_connections.

The Hardware Reality: Why IOPS Matter

You can tune software all day, but if your I/O Wait is high, your latency will spike. API Gateways are surprisingly write-heavy. They generate massive access logs, error logs, and often buffer request bodies to disk if they exceed memory limits.

In a standard spinning disk (HDD) or even a cheap SATA SSD environment, high concurrency leads to I/O blocking. The CPU sits idle, waiting for the disk to write a log line, while the user waits for a response.

This is where CoolVDS differs from the budget providers. We standardized on NVMe storage for all instances. NVMe interfaces directly with the PCIe bus, bypassing the SATA bottleneck. In our benchmarks, an NVMe-backed gateway can handle 4x the request volume of a SATA SSD equivalent before latency degrades.

Benchmarking Your Disk I/O

Don't take my word for it. Run fio on your current host. If you aren't seeing random write speeds above 200MB/s, your storage is the bottleneck.

fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=1G --readwrite=randwrite --rwmixwrite=75

If you are running a database or a stateful gateway (like Kong with Postgres), this becomes doubly critical. Slow database writes will lock the gateway's worker threads.

The "War Story": The NIX Peering Incident

Last year, I audited a setup for a Norwegian media company. They were hosting their API Gateway in a "Cloud Region" in Amsterdam to save a few kroner. Their users were primarily in Oslo and Bergen.

Every evening at 20:00, latency spiked. The issue wasn't the server capacity; it was the public internet routing between Amsterdam and Oslo during peak streaming hours. The round-trip time (RTT) fluctuated wildy.

We migrated the workload to a CoolVDS instance located in Norway. Because we peer directly at NIX (Norwegian Internet Exchange), the physical distance dropped from 1,200km to practically zero. Network latency dropped from 35ms to 2ms. More importantly, the jitter disappeared. Stability is not just about code; it is about physics.

Troubleshooting in Real-Time

When things go wrong, you need to know exactly what the TCP stack is doing. Forget netstat (it's slow on active servers); use ss.

To see a summary of socket statistics:

ss -s

If you see a high number in timewait, verify your sysctl settings. If you see many in syn-recv, you might be under a DDoS attack or simply overwhelmed by legitimate traffic.

Conclusion

Performance tuning is an exercise in removing constraints. First, you remove the constraints of the Linux kernel. Next, you remove the inefficiencies of the NGINX configuration. Finally, you must remove the physical constraints of poor hardware and distant network geography.

In the regulatory climate of 2021, hosting locally in Norway is the only logical choice for compliance. Combining that with the raw power of NVMe and KVM virtualization gives you a platform that doesn't just work—it screams.

Don't let slow I/O or bad routing kill your project. Deploy a high-performance test instance on CoolVDS today and see what sub-millisecond local latency looks like.