Stop Accepting Default Configs: A Guide to Sub-Millisecond API Routing

If your API gateway adds more than 15ms of overhead to a request, you are bleeding users. In the high-frequency trading floors of Oslo or the streaming backends of Stockholm, latency isn't just a metric; it is the product. I recently audited a Kubernetes ingress setup for a Norwegian fintech startup. They were baffled why their "scalable" architecture choked at 2,000 requests per second (RPS). The hardware was fine. The code was optimized Go. But the gateway was running default settings.

Default settings are designed for compatibility, not performance. They are safe, conservative, and essentially useless for high-load production environments. Today, we are going to strip down the Linux kernel and NGINX (the engine behind Kong, generic Ingress, and many others) to handle real traffic. We will also address the elephant in the room: why running this on shared, oversold cloud instances is a fool's errand.

1. The Foundation: Kernel Tuning (sysctl)

Before touching the application layer, we must look at the OS. Most Linux distributions, including Ubuntu 20.04 LTS, ship with conservative networking limits. When your API gateway gets hit with a burst of traffic, the kernel will drop packets long before NGINX runs out of CPU.

We need to modify /etc/sysctl.conf. Specifically, we need to widen the ephemeral port range and allow for faster TCP recycling. In a post-Schrems II world where we are moving workloads back to local infrastructure to satisfy Datatilsynet, you likely have full root access to your VPS. Use it.

Critical TCP Adjustments

# /etc/sysctl.conf configuration for high-load API Gateways

# Increase the maximum number of open file descriptors
fs.file-max = 2097152

# Maximize the backlog of incoming connections. 
# If this is too low, you'll see connection timeouts during spikes.
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535

# Reuse sockets in TIME_WAIT state for new connections
# Essential for high throughput short-lived connections (REST APIs)
net.ipv4.tcp_tw_reuse = 1

# Increase local port range to avoid exhaustion
net.ipv4.ip_local_port_range = 1024 65535

# Protection against SYN flood attacks (basic DDoS protection)
net.ipv4.tcp_syncookies = 1

Apply these changes with sysctl -p. Without tcp_tw_reuse, your gateway will run out of sockets during heavy load testing because thousands of connections sit idle in TIME_WAIT, blocking new traffic.

2. NGINX: The Engine Room

Whether you use raw NGINX, OpenResty, or Kong, the underlying mechanics are identical. The biggest bottleneck I see in 2021 is the mismatch between `worker_connections` and file descriptors.

Here is a battle-tested snippet for your `nginx.conf`. This setup assumes you are terminating SSL (which requires significant CPU) and proxying to an upstream service.

worker_processes auto;

# Increase limit of open files per worker
worker_rlimit_nofile 65535;

events {
    # Determines how many clients a single worker can handle
    worker_connections 16384;
    
    # Accept as many connections as possible, immediately
    multi_accept on;
    use epoll;
}

http {
    # ... logging and mime types ...

    # BUFFERING: Crucial for payload handling
    client_body_buffer_size 128k;
    client_max_body_size 10m;
    
    # KEEPALIVE: Reduce the handshake overhead
    # Don't close the connection after one request. 
    keepalive_timeout 65;
    keepalive_requests 100000;
    
    # SSL OPTIMIZATION (Modern 2021 Standards)
    ssl_session_cache shared:SSL:50m;
    ssl_session_timeout 1d;
    ssl_session_tickets off;
    
    # Modern Cipher Suites only
    ssl_protocols TLSv1.2 TLSv1.3;
}

Pro Tip: If you are using NGINX as a load balancer, ensure your upstream block also utilizes keepalive connections to the backend application. Otherwise, NGINX opens a new socket to your backend for every single incoming request, doubling your TCP overhead.

3. The Hardware Reality: NVMe and Noisy Neighbors

You can tune your kernel until it sings, but software cannot fix bad physics. In 2021, deploying a database-heavy API gateway on standard SSDs (SATA) is negligence. The IOPS ceiling is simply too low.

Furthermore, the "Steal Time" metric is the silent killer of API performance. On budget VPS providers, your CPU cycles are shared aggressively. If a neighbor decides to mine crypto or compile a massive Rust project, your API latency spikes. This is inconsistent and impossible to debug.

This is where CoolVDS differentiates itself in the crowded Nordic market. We don't just offer "virtual servers"; we offer isolation. By utilizing KVM (Kernel-based Virtual Machine) rather than container-based virtualization (like OpenVZ), we ensure your RAM and CPU are yours. Combined with local NVMe storage, the I/O wait times are practically non-existent.

Comparison: Standard Cloud vs. CoolVDS High-Perf

Feature	Standard Budget VPS	CoolVDS Performance Tier
Storage	SATA SSD (Shared)	NVMe (Direct PCI-e)
Virtualization	Shared Kernel (Container)	KVM (Hardware Isolation)
Latency (Oslo)	~15-30ms	< 5ms
Noisy Neighbor Risk	High	Minimal

4. Geographic Latency and Compliance

Since the Schrems II ruling in July 2020, relying on US-owned hyper-scalers has become a legal minefield for Norwegian companies processing personal data. Latency is physical, but data sovereignty is legal. Hosting on VPS Norway infrastructure solves both.

Data packets traveling from Oslo to a data center in Frankfurt and back take time. Physics dictates roughly 20-30ms round trip at best. By hosting locally on CoolVDS, you cut that network latency to under 5ms for local users. For an API Gateway making multiple internal calls, that difference compounds rapidly.

5. Testing Your Setup

Don't take my word for it. After applying these configurations, run a load test using wrk or vegeta.

# Install wrk (available in standard repos in 2021)
sudo apt install wrk

# Run a test: 12 threads, 400 connections, for 30 seconds
wrk -t12 -c400 -d30s https://your-api-endpoint.com/health

If you see a significant "Socket errors" count, your file descriptors (ulimit -n) are still too low. If your latency distribution shows a long tail (99th percentile > 500ms), your CPU is stealing cycles or your disk I/O is saturated.

Performance tuning is an iterative process. It requires staring at graphs, tweaking a buffer size, and testing again. But it starts with the right foundation. Don't let slow I/O or a shared kernel kill your application's reputation. Deploy a CoolVDS NVMe instance today and give your configurations the hardware they deserve.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

Crushing Latency: Advanced API Gateway Tuning for High-Throughput Systems

Stop Accepting Default Configs: A Guide to Sub-Millisecond API Routing

1. The Foundation: Kernel Tuning (sysctl)

Critical TCP Adjustments

2. NGINX: The Engine Room

3. The Hardware Reality: NVMe and Noisy Neighbors

Comparison: Standard Cloud vs. CoolVDS High-Perf

4. Geographic Latency and Compliance

5. Testing Your Setup

/// RELATED POSTS

Edge Computing in Norway: Architecting for Sub-5ms Latency in 2025

Kubernetes Networking Deep Dive: Optimizing Packet Flow for Low Latency in 2025

Surviving the Packet Storm: A Deep Dive into Kubernetes Networking & CNI Performance in 2025

Surviving the Millisecond War: Edge Computing Architectures for the Nordic Market

Kubernetes Networking Deep Dive: Why Your Packets Are Dropping in the Overlay

Serverless Without the Handcuffs: Implementing Private FaaS Patterns on High-Performance VDS in 2025