Latency is the Silent Killer of Microservices

Let’s be honest: moving from a monolith to microservices solves scalability issues but introduces a nightmare of network latency. You split your application into twelve different services, and suddenly a single user request generates twenty internal HTTP calls. If your API Gateway—the traffic cop managing this chaos—adds even 50ms of overhead, your user experience is dead on arrival. Especially here in Norway, where users on 4G networks expect instant responsiveness comparable to local desktop apps.

I’ve spent the last week debugging a high-load setup for a fintech client in Oslo. They were running a standard NGINX reverse proxy on a generic cloud instance. The symptoms? High CPU usage on ksoftirqd (soft interrupts), dropped packets, and 502 Bad Gateway errors whenever traffic spiked. The application code was fine. The bottleneck was the Linux kernel defaults and a sluggish virtualization layer.

This guide isn't about writing better code. It's about tuning the engine—Linux and NGINX—and why the underlying hardware (specifically KVM and NVMe) dictates your ceiling.

1. The OS Layer: Tuning the Linux Kernel

Most Linux distributions, including Ubuntu 16.04 LTS or CentOS 7, ship with conservative networking defaults intended for desktop usage or light web serving, not high-concurrency API gateways processing thousands of requests per second. If you don't tune sysctl.conf, you are running with the handbrake on.

The Backlog Problem

When a TCP connection comes in, it sits in a queue waiting to be accepted. If this queue fills up, Linux drops the packet. You’ll see this in netstat -s as "times the listen queue of a socket overflowed".

Open /etc/sysctl.conf and add the following. This allows the kernel to handle a massive influx of SYN packets without choking:

# Increase the maximum number of queued connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 5000

# Expand the port range to prevent port exhaustion
net.ipv4.ip_local_port_range = 1024 65535

# Enable TCP Fast Open (requires kernel 3.7+)
net.ipv4.tcp_fastopen = 3

Pro Tip: Enabling TCP Fast Open can reduce network latency by one full round-trip time (RTT) during the connection handshake. For users connecting from mobile networks in Northern Norway, this latency reduction is noticeable.

Time_Wait Reuse

API Gateways create high churn on file descriptors. A common issue is running out of available sockets because thousands of connections act stuck in the TIME_WAIT state. While many old guides tell you to enable tcp_tw_recycle, do not do this if you are behind a NAT, as it breaks connections. Instead, use reuse:

net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15

Reload your settings with sysctl -p.

2. NGINX Configuration for API Routing

Whether you are using raw NGINX, OpenResty, or Kong (which is gaining serious traction this year), the underlying configuration principles remain the same. The default nginx.conf is not optimized for API traffic.

Worker Processes and File Descriptors

First, ensure NGINX can open enough files. A gateway proxying requests needs two file descriptors per connection (one to the client, one to the upstream service).

worker_processes auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 16384;
    use epoll;
    multi_accept on;
}

Upstream Keepalive

This is the most common mistake I see in 2016. By default, NGINX closes the connection to your upstream application (Node.js, Go, PHP-FPM) after every request. This forces a new TCP handshake and SSL negotiation (if internal) for every single API call. It kills performance.

You must configure the upstream block to keep connections open:

upstream backend_api {
    server 10.0.0.5:8080;
    server 10.0.0.6:8080;

    # Keep 64 idle connections open to the backend
    keepalive 64;
}

server {
    location /api/ {
        proxy_pass http://backend_api;
        
        # Required for keepalive to work
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}

Note: Without clearing the "Connection" header, NGINX forwards the client's "Connection: close" header to the backend, rendering the keepalive directive useless.

3. The Hardware Reality: Why Virtualization Matters

You can tune your kernel until you are blue in the face, but if your hosting provider over-commits their CPU or uses slow storage, your API Gateway will suffer from "Steal Time". This happens when the hypervisor forces your VM to wait while it serves another noisy neighbor.

For an API Gateway, I/O Wait is the enemy. Logging requests, writing to cache, and SSL handshakes all depend on fast disk and CPU access.

The CoolVDS Difference

At CoolVDS, we don't play the "burst RAM" game. We use KVM (Kernel-based Virtual Machine) for strict isolation. Unlike OpenVZ containers often sold by budget hosts, KVM ensures that the kernel resources we just tuned are actually dedicated to your instance.

Furthermore, we exclusively use enterprise NVMe storage. In 2016, many providers are still rotating spinning rust or standard SATA SSDs. The difference in IOPS (Input/Output Operations Per Second) is staggering:

Storage Type	Random Read IOPS	API Latency Impact
7.2k HDD	~100	High (Logging blocks request)
SATA SSD	~5,000	Medium
CoolVDS NVMe	~350,000+	Near Zero

When your API logs access data or buffers heavy payloads to disk, NVMe ensures the CPU isn't sitting idle waiting for the drive. This is critical for complying with strict SLAs.

4. Local Considerations: Data Sovereignty

With the recent invalidation of Safe Harbor and the introduction of the Privacy Shield framework this July, data location is a hot topic for CTOs across Europe. The Norwegian Data Protection Authority (Datatilsynet) is clear about the responsibilities of data controllers.

Hosting your API Gateway outside of the EEA (European Economic Area) introduces legal friction. By deploying on CoolVDS, your data resides physically in Oslo. Not only does this solve the legal headache, but it also physically reduces latency. If your users are in Norway, why route packets through Frankfurt?

Conclusion

Building a high-performance API Gateway requires a holistic view: aggressive Linux kernel tuning, precise NGINX configuration, and a hardware platform that respects your resource needs. Don't let default settings throttle your innovation.

Ready to see the difference low-latency infrastructure makes? Deploy a CoolVDS NVMe instance in 55 seconds and run ab -k -c 100 -n 10000 against your current provider. The numbers won't lie.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

Optimizing High-Throughput API Gateways: Kernel Tuning & NGINX Secrets

Latency is the Silent Killer of Microservices

1. The OS Layer: Tuning the Linux Kernel

The Backlog Problem

Time_Wait Reuse

2. NGINX Configuration for API Routing

Worker Processes and File Descriptors

Upstream Keepalive

3. The Hardware Reality: Why Virtualization Matters

The CoolVDS Difference

4. Local Considerations: Data Sovereignty

Conclusion

/// RELATED POSTS

API Gateway Tuning: Crushing Latency in High-Traffic Nordic Systems

Silence the Noise: Advanced APM Strategies for High-Throughput Norwegian Systems

Bun vs. Node.js in 2025: Why High-Performance Runtimes Die on Cheap VPS Hardware

Zero-Compromise API Gateway Tuning: Reducing Latency from Oslo to the Edge

Nordic Latency Killers: Advanced API Gateway Tuning for High-Throughput Systems

Zen 5 in the Datacenter: Why We Deployed AMD Ryzen 9000 Series for High-Performance VDS