Console Login

Optimizing API Gateway Performance: Tuning Nginx & Kong for Low Latency

The Gateway Bottleneck: Why Your Microservices are Stalling

It is 2017, and the monolith is dying. You have broken your application into twelve different services. You have containerized them with Docker. You feel modern. But your response times have jumped from 50ms to 400ms.

Why? Because you introduced a middleman. The API Gateway.

Whether you are using raw Nginx, HAProxy, or a dedicated solution like Kong (which is just Nginx with Lua and a database), the gateway is now your single point of failure. I have seen production environments in Oslo fall over not because the code was bad, but because the default Linux kernel settings are designed for 1990s mail servers, not high-concurrency API traffic. If you are serving traffic to Norwegian users via NIX (Norwegian Internet Exchange), every millisecond of overhead you add at the gateway level is a wasted opportunity.

Let's fix it. We are going to look at the OS layer, the Nginx configuration, and the hardware reality.

1. The OS Lie: File Descriptors

By default, many Linux distributions ship with a soft limit of 1024 open files per user. In an API gateway architecture, every incoming connection is a file. Every connection to an upstream microservice is a file. If you have 500 users, you might consume 1000 file descriptors. Hit that limit, and your logs fill with Too many open files while your clients get 502 Bad Gateway errors.

You need to raise this hard. Do not be shy.

Edit /etc/security/limits.conf:

* soft nofile 65535
* hard nofile 65535
root soft nofile 65535
root hard nofile 65535

Verify it applies to your Nginx user running the gateway:

su - nginx -c 'ulimit -n'

2. Kernel Tuning for High Churn

API Gateways suffer from high connection churn. Clients connect, get JSON, and disconnect. This leaves thousands of sockets in the TIME_WAIT state, consuming resources until the OS decides to clean them up (which can take 60 seconds).

We need to tell the kernel to reuse these sockets faster. Add this to /etc/sysctl.conf:

# Allow reusing sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65000

# Max number of packets in the receive queue
net.core.netdev_max_backlog = 10000

# Max connections waiting for accept()
net.core.somaxconn = 65535

# Increase TCP buffer sizes for high-performance usage
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864

Apply these with sysctl -p. If you are on a standard VPS provider, some of these might be locked down. This is why we use KVM virtualization at CoolVDS—you get a dedicated kernel. If you can't tune somaxconn, you can't scale.

3. Nginx / Kong Configuration

Most defaults in nginx.conf are conservative. For a gateway, we need to maximize worker efficiency. The most critical mistake I see is not using keepalives to the upstream services.

Without upstream keepalives, Nginx opens a new TCP connection to your microservice for every single request. That involves a full TCP handshake. It is slow and CPU intensive.

Here is how a tuned upstream block looks:

upstream backend_api {
    server 10.0.0.5:8080;
    
    # The Critical Setting
    keepalive 64;
}

server {
    location /api/ {
        proxy_pass http://backend_api;
        
        # Required for HTTP 1.1 keepalive to upstream
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}

Additionally, ensure your worker settings match your CPU topology:

worker_processes auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 16384;
    use epoll;
    multi_accept on;
}
Pro Tip: If you are using Kong (v0.10.x), remember that it uses a database (PostgreSQL or Cassandra) to store routes. If your DB is on the same disk as your logs and you aren't using NVMe storage, your latency will spike during log rotation. Isolate your I/O.

4. SSL Termination: The CPU Eater

In 2017, everyone is moving to HTTPS. Google is penalizing HTTP sites, and with the upcoming GDPR regulations (Datatilsynet is ramping up checks), encryption is non-negotiable. However, the handshake is expensive.

Don't use 4096-bit RSA keys unless legally required; 2048-bit is sufficient and significantly faster. Prioritize ECDHE (Elliptic Curve) ciphers.

A modern 2017-era cipher suite looks like this:

ssl_ciphers 'EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH';
ssl_prefer_server_ciphers on;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;

The ssl_session_cache is vital. It allows clients to resume a session without a full handshake, cutting latency by 50% on subsequent requests.

5. The Infrastructure Reality

You can apply all these configs, but software cannot fix hardware latency. An API Gateway is I/O bound on network interrupts and context switches.

Many "Cloud" providers oversubscribe their CPUs. If your neighbor on the physical host starts compiling a kernel, your Nginx workers get paused. In a gateway, a 50ms pause causes a pile-up of requests that can crash the service. This is the "Noisy Neighbor" effect.

Furthermore, standard SATA SSDs are often not enough for high-logging environments (access logs + error logs + Kong DB). This is why CoolVDS moved strictly to NVMe storage and dedicated KVM slices. We don't oversubscribe CPU, because we know that consistent latency is more important than raw burst speed.

Benchmarking Your Setup

Don't guess. Measure. Use wrk to load test your gateway locally before going live.

wrk -t12 -c400 -d30s http://localhost/api/resource

If you aren't hitting at least 10,000 requests per second on a simple echo endpoint, check your dmesg for TCP errors or verify you aren't throttling on CPU Steal Time (%st in top).

Conclusion

Performance tuning is an iterative process. Start with the kernel, move to the Nginx configuration, and verify your hardware limitations. If you are preparing for the data privacy shifts coming to Europe next year, you need a stable, secure, and fast foundation.

Don't let legacy hosting infrastructure be the bottleneck for your modern stack. Spin up a CoolVDS instance with NVMe today and see what your API is actually capable of.