Console Login

API Gateway Latency: Tuning Nginx & OpenResty for High-Throughput Microservices

Stop Blaming the Network: Your API Gateway Configuration is Trash

It is January 2018. We are seeing a massive shift towards microservices, but nobody talks about the network overhead tax. You split your monolith into twelve services, and suddenly a single user request generates fourteen internal HTTP calls. If your API gateway adds even 15ms of overhead per hop, your application feels broken.

I see this every week. A client migrates to a "scalable" architecture, deploys a default Nginx or Kong setup, and wonders why their Time To First Byte (TTFB) on mobile networks has doubled. The culprit is rarely the code logic; it's the glue holding it together.

Let's fix it. We are going to look at raw socket optimization, SSL termination, and the Linux kernel parameters that actually matter for API traffic.

The War Story: Black Friday 2017

Two months ago, a mid-sized Norwegian e-commerce retailer (hosting on a generic European VPS provider) hit a wall. They weren't CPU bound. They weren't RAM bound. But their API gateway—running OpenResty—was dropping 5% of incoming connections during peak traffic.

The logs showed crit errors: worker_connections are not enough. They bumped the config. Then the kernel panicked with nf_conntrack: table full.

The issue? They were treating an API gateway like a static file server. APIs are chatty. They open and close thousands of short-lived TCP connections per second. Without aggressive keepalive tuning and file descriptor management, you run out of ephemeral ports before you run out of CPU cycles. We moved them to a CoolVDS KVM instance to eliminate neighbor noise (steal time), tuned the stack, and dropped latency by 60%.

1. The Nginx Configuration You Actually Need

Most tutorials give you defaults from 2014. For an API gateway in 2018 handling high concurrency, you need to uncap the worker limits. This isn't about caching images; it's about shuffling packets.

Here is the baseline nginx.conf for a high-throughput gateway:

worker_processes auto;
# Critical: Increase file descriptor limit for the process
worker_rlimit_nofile 100000;

events {
    # Use epoll on Linux 2.6+
    use epoll;
    
    # Allow worker to accept all new connections at once
    multi_accept on;
    
    # 1024 is a joke. Set this high.
    worker_connections 8096;
}

http {
    # ... logging and mime types ...

    # DISABLE ACCESS LOGS if you can, or buffer them.
    # Disk I/O on logs is the silent killer of throughput.
    access_log /var/log/nginx/access.log combined buffer=32k flush=1m;

    # TCP Optimization
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    
    # Keepalive is mandatory for API Upstreams
    keepalive_timeout 65;
    keepalive_requests 10000;
}

The Upstream Keepalive Trap

This is where 90% of setups fail. By default, Nginx talks to your backend services (Node.js, Go, Python) using HTTP/1.0 and closes the connection after every request. This forces a new TCP handshake for every internal API call. It is insane waste.

You must explicitly enable keepalive to upstreams:

upstream backend_api {
    server 10.0.0.5:8080;
    server 10.0.0.6:8080;
    
    # Keep 64 idle connections open to the backend
    keepalive 64;
}

server {
    location /api/ {
        proxy_pass http://backend_api;
        
        # Required for keepalive to work
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}
Pro Tip: If you are using SSL between your gateway and your backends (Zero Trust), the handshake overhead is even worse. Keepalives are non-negotiable here. On CoolVDS, our internal network latency is negligible, but SSL math is still math.

2. Kernel Tuning: `sysctl.conf`

Linux is tuned for general-purpose desktop usage out of the box. For a high-performance gateway, we need to adjust how the kernel handles TCP states. Specifically, we need to recycle connections faster.

Edit /etc/sysctl.conf and apply these settings. Be careful—these are powerful.

# Maximize the backlog of incoming connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535

# Reuse sockets in TIME_WAIT state for new connections
# (Crucial for high-traffic API gateways)
net.ipv4.tcp_tw_reuse = 1

# Increase ephemeral port range to allow more concurrent connections
net.ipv4.ip_local_port_range = 1024 65535

# Protect against SYN flood, but don't drop legitimate traffic too early
net.ipv4.tcp_max_syn_backlog = 65535

# Reduce timeout for FIN-WAIT-2
net.ipv4.tcp_fin_timeout = 15

Run sysctl -p to apply. If you don't do this, Nginx will be ready to accept 10,000 connections, but the Linux kernel will slam the door in their face after 1,024.

3. SSL/TLS: Performance vs. Security

It's 2018. If you aren't serving HTTPS, Google Chrome is shaming you. But encryption costs CPU. To minimize the hit, you need session resumption and OCSP stapling.

This configuration reduces the handshake time, which is the slowest part of a mobile request.

ssl_protocols TLSv1.2; # TLS 1.0/1.1 are dying. TLS 1.3 isn't here yet.
ssl_ciphers EECDH+AESGCM:EDH+AESGCM;
ssl_prefer_server_ciphers on;

# Cache SSL sessions to avoid full handshakes on repeat visits
ssl_session_cache shared:SSL:50m;
ssl_session_timeout 1d;

# OCSP Stapling (Let the server verify the cert, not the client)
ssl_stapling on;
ssl_stapling_verify on;
resolver 8.8.8.8 8.8.4.4 valid=300s;
resolver_timeout 5s;

The Hardware Reality: Why "Cloud" Often Fails

You can tune Nginx until your fingers bleed, but if your underlying hypervisor is stealing your CPU cycles, it doesn't matter. In a shared hosting environment or a crowded public cloud, "Steal Time" (st) occurs when the hypervisor makes your VM wait while it serves another noisy neighbor.

For an API Gateway, consistency is key. A 200ms spike in CPU availability causes a request queue backup that takes seconds to drain. This is why we built CoolVDS on KVM with strict resource isolation. We don't oversell CPU cores.

Local Latency Matters: The Oslo Factor

If your user base is in Norway, hosting in Frankfurt or Amsterdam adds 20-30ms of round-trip time (RTT). That sounds small, but TCP requires multiple round trips to establish a connection (SYN, SYN-ACK, ACK, plus TLS handshake). That 30ms penalty is paid 3 or 4 times before the first byte of data is sent.

Hosting locally in Oslo, utilizing the NIX (Norwegian Internet Exchange), keeps that RTT under 5ms. With GDPR enforcement starting in May, keeping data within national borders is also becoming a compliance advantage that your legal team will appreciate.

Summary

Optimizing an API gateway is about removing bottlenecks in the pipeline:

  1. Kernel: Open the floodgates for file descriptors and ephemeral ports.
  2. Nginx: Enable upstream keepalives to stop connection churning.
  3. SSL: Use session caching and OCSP stapling.
  4. Hardware: Avoid CPU steal time and seek low-latency storage.

Don't let slow I/O kill your SEO or your user experience. If you want to test this configuration on hardware that doesn't fight against you, deploy a test instance on CoolVDS. It takes 55 seconds, and our NVMe storage eats database queries for breakfast.