Console Login

API Gateway Performance Tuning: Squeezing Microseconds for High-Throughput Systems

Squeezing Microseconds: Advanced API Gateway Tuning

If you are running an API Gateway on default settings, you are effectively driving a Ferrari in first gear. In the Nordic hosting market, where latency to the Norwegian Internet Exchange (NIX) is measured in single-digit milliseconds, adding overhead at the application layer is a sin. I have seen too many engineering teams throw money at larger instances when the real bottleneck was a default sysctl.conf or poor Nginx worker allocation.

The goal isn't just "uptime." It is processing requests before the client even realizes the packet left their network. Whether you are running Kong, Tyk, or a raw OpenResty setup, the underlying principles of Linux I/O and CPU affinity remain the absolute truth. Here is how we tune for maximum throughput on CoolVDS infrastructure.

1. The Worker Process Fallacy & CPU Affinity

Most tutorials tell you to set worker_processes auto; and walk away. That is fine for a hobby blog. It is negligence for a high-load API gateway handling payment processing or real-time data ingestion.

When high throughput hits, context switching becomes your enemy. If the kernel keeps moving your Nginx workers between CPU cores, you lose Level 1/Level 2 CPU cache locality. On our high-frequency compute instances, we enforce strict CPU affinity to pin workers to specific cores.

Configuration Implementation

In your nginx.conf, explicit bitmasking is often more reliable than auto-detection in virtualized environments where "vCPUs" can sometimes report inconsistently depending on the hypervisor (though CoolVDS KVM guarantees core isolation).

worker_processes 4;

# Bind workers to cores 0001, 0010, 0100, 1000
worker_cpu_affinity 0001 0010 0100 1000;

events {
    worker_connections 65535;
    use epoll;
    multi_accept on;
}
Pro Tip: Setting multi_accept on tells the worker to accept as many connections as possible after a new connection notification. However, keep an eye on latency. In some extremely bursty workloads, this can cause a worker to hog the CPU, delaying response times for existing connections. Benchmark it with wrk before deploying to production.

2. Breaking the File Descriptor Limits

In late 2022, Linux distributions still often ship with conservative limits. An API gateway acts as a proxy; every incoming request opens a socket, and every upstream call opens another. You will hit the `1024` open files limit instantly under load.

This isn't just about Nginx; it is about the OS limits. We need to tune both fs.file-max (system-wide) and `nofile` (user-specific).

# Check current limit
ulimit -n

# Edit /etc/security/limits.conf
* soft nofile 200000
* hard nofile 200000
root soft nofile 200000
root hard nofile 200000

For the changes to persist across systemd services (like Nginx), you must also override the service definition:

# /etc/systemd/system/nginx.service.d/override.conf
[Service]
LimitNOFILE=200000

3. Kernel TCP Stack Optimization

This is where the battle is won or lost. The default Linux TCP stack is tuned for general-purpose desktop usage, not for a gateway handling 50,000 concurrent connections. We need to modify how the kernel handles TCP states, specifically TIME_WAIT.

When your gateway closes a connection to an upstream service, that socket sits in TIME_WAIT for 60 seconds (default). If you churn through sockets fast enough, you will run out of ephemeral ports. This results in the dreaded `Cannot assign requested address` error.

Apply these settings to /etc/sysctl.conf:

# Allow reusing sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 10000 65000

# Max number of packets in the receive queue
net.core.netdev_max_backlog = 16384

# Max number of connections in the listen queue
net.core.somaxconn = 8192

# TCP Fast Open (TFO) to reduce handshake latency
net.ipv4.tcp_fastopen = 3

# BBR Congestion Control (Available since kernel 4.9)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

Note on BBR: We enable Google's BBR congestion control by default on our standard images because it significantly improves throughput over high-latency links, which is crucial if your users are connecting from outside Northern Europe.

4. Upstream Keepalives: The Silent Latency Killer

A specific scenario we debugged recently involved a client in Oslo hosting a microservices architecture. They had excellent ping times between servers (sub-1ms), but the API gateway was adding 30ms of overhead.

The culprit? They were not using HTTP keepalives to the upstream application servers. Nginx was performing a full TCP handshake + SSL handshake for every single request forwarded to the backend.

Here is the corrected configuration:

upstream backend_service {
    server 10.0.0.5:8080;
    server 10.0.0.6:8080;
    
    # Keep 64 idle connections open to the backend
    keepalive 64;
}

server {
    location /api/ {
        proxy_pass http://backend_service;
        
        # Required for keepalive to work
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}

By implementing this, the overhead dropped from 30ms to roughly 2ms. It reduced CPU usage on the CoolVDS instances by 40% because we stopped burning cycles on SSL handshakes.

5. The Storage Factor: Why NVMe Matters for Gateways

People assume API gateways are purely CPU/RAM dependent. They forget about logging. High-volume access logs can saturate a SATA SSD controller, causing I/O wait (iowait) that blocks the CPU.

At CoolVDS, we use strict NVMe storage arrays. However, even with NVMe, you should buffer your logs to avoid disk writes on every request.

# Buffer logs in memory (64k) and flush every 5 minutes
access_log /var/log/nginx/access.log combined buffer=64k flush=5m;

Infrastructure Architecture Comparison

Choosing where to host your gateway is as important as the config. In Norway, data sovereignty (GDPR) is paramount, but so is raw performance.

Feature Public Cloud LB Shared VPS (OpenVZ) CoolVDS (KVM/NVMe)
Kernel Tuning Access Restricted Restricted (Shared Kernel) Full Control
Neighbor Noise High Variance High Risk Isolated Resources
Data Location Often hidden (EU generalized) Variable Oslo, Norway
Cost per 10k RPS $$$ (Bandwidth fees) $ (But unstable) $$ (Predictable)

Conclusion

Optimizing an API gateway is about removing friction. You remove friction in the kernel by reusing sockets. You remove friction in Nginx by pinning workers. And you remove friction in the infrastructure by choosing a provider that doesn't steal your CPU cycles for another client.

With the tightening of data transfer regulations like Schrems II, hosting your gateway within Norway on compliant infrastructure is not just a technical decision, it is a legal safeguard.

Don't let slow I/O kill your application performance. Deploy a properly tuned KVM instance on CoolVDS today and see what your stack is actually capable of.