Console Login

API Gateway Latency: Tuning Nginx & Kong for High-Throughput in Norway

API Gateway Latency: Tuning Nginx & Kong for High-Throughput in Norway

If you are routing traffic through a standard API Gateway configuration in 2022 without touching the Linux kernel, you are leaving approximately 40% of your throughput on the table. I have seen it time and time again: a development team in Oslo deploys a perfectly optimized Go or Rust microservice, only to have the request die a slow death inside a default Nginx ingress controller.

The problem usually isn't the application logic. It's the file descriptors, the TCP backlog, and the noisy neighbors stealing CPU cycles on your budget cloud provider. When your target audience is local—relying on the Norwegian Internet Exchange (NIX)—every millisecond added by a misconfigured gateway is an insult to the infrastructure.

We are going to look at how to tune an API Gateway (focusing on Nginx and Kong) for raw performance on a Linux stack. This assumes you are running on a modern kernel (5.4+), standard on Ubuntu 20.04/22.04 LTS.

1. The OS Layer: Open Files and TCP Stack

Before you even touch nginx.conf, look at your operating system. Linux, by default, is tuned for a desktop experience, not for handling 10,000 concurrent API connections. The first bottleneck you will hit is the file descriptor limit. In Linux, everything is a file—including a TCP connection.

Check your current limits:

ulimit -n

If it says 1024, your gateway will capsize under load. You need to increase this permanently in /etc/security/limits.conf:

* soft nofile 65535
* hard nofile 65535
root soft nofile 65535
root hard nofile 65535

Next, we move to the kernel parameters. This is where the magic happens. We need to modify /etc/sysctl.conf to handle a high rate of incoming connections and rapid connection recycling. This is critical for REST APIs where connections are often short-lived.

Here is the production-grade sysctl configuration we use on CoolVDS high-performance templates:

# Maximize the backlog of incoming connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535

# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535

# Enable TCP Time Wait Reuse (Crucial for high throughput)
net.ipv4.tcp_tw_reuse = 1

# Fast Open for lower latency on re-connections
net.ipv4.tcp_fastopen = 3

# Increase TCP buffer sizes for 10Gbps+ links
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# Protection against SYN floods (basic ddos protection)
net.ipv4.tcp_syncookies = 1

Apply these changes with sysctl -p. Note that tcp_tw_recycle was removed in newer kernels due to issues with NAT, so stick to tcp_tw_reuse.

2. Nginx / Kong Configuration Tuning

Whether you are using raw Nginx or Kong (which is built on OpenResty/Nginx), the core directives remain similar. The most common error is failing to utilize upstream keepalives. Without keepalives, your gateway opens a new TCP connection to the backend microservice for every single request. This adds the full TCP handshake overhead to every API call.

The Worker Configuration

First, ensure your worker processes match your CPU cores. If you are on a CoolVDS NVMe instance with 4 vCPUs, you want 4 workers. Auto-detection usually works, but explicit definition prevents drift.

worker_processes auto;

Crucially, bump the worker connections limit to match your OS file limits:

worker_connections 16384;

The Upstream Keepalive Block

This is the single most impactful change for internal latency. Configure your upstream block to hold connections open:

upstream backend_api {
    server 10.0.0.5:8080;
    server 10.0.0.6:8080;

    # Keep 64 idle connections open to the backend
    keepalive 64;
}

However, simply adding `keepalive` is not enough. You must also force Nginx to use HTTP/1.1 for the proxy traffic and clear the connection header, or it will close the connection anyway:

location /api/ {
    proxy_pass http://backend_api;
    
    # REQUIRED for keepalive to work
    proxy_http_version 1.1;
    proxy_set_header Connection "";

    # Buffering tuning
    proxy_buffers 16 4k;
    proxy_buffer_size 2k;
}
Pro Tip: If you are using Kong, these settings are injected via the `nginx_http_upstream_keepalive` property in `kong.conf`. Don't rely on the defaults if you are pushing over 1,000 RPS.

3. The Hardware Reality: Why Cloud "Steal Time" Kills APIs

You can have the most optimized Nginx config in the world, but if your underlying hypervisor is oversubscribing the CPU, your API Gateway will suffer from "Steal Time" (st). This is when the hypervisor forces your VM to wait while another neighbor uses the physical CPU core.

For an API Gateway, consistency is more important than raw speed. A variability of 50ms in gateway processing can cause timeouts in a microservices chain.

This is why we architect CoolVDS differently. We use KVM virtualization with strict resource isolation. We don't oversubscribe CPU cores on our high-performance tiers. When you run top on a CoolVDS instance, you want to see 0.0 st. If you are hosting on a budget provider and seeing 5.0 st or higher, no amount of software tuning will fix your latency.

4. Logging: The Silent I/O Killer

Writing access logs to disk is an expensive I/O operation. In high-traffic environments, synchronous disk writes can block the Nginx worker process.

If you need logs for compliance (GDPR/Datatilsynet requirements often mandate audit trails), you have two options:

  1. Buffer the logs: Write to memory first, flush to disk later.
  2. Use NVMe Storage: High IOPS prevent the write queue from blocking.

Here is how to buffer logs in Nginx to avoid blocking:

access_log /var/log/nginx/access.log combined buffer=32k flush=5s;

This tells Nginx: "Wait until you have 32kb of logs OR 5 seconds have passed before touching the disk." This significantly reduces IOPS pressure.

5. SSL/TLS Termination

Decryption is CPU intensive. In 2022, sticking to OpenSSL 1.1.1 or 3.0 is standard. Ensure you are using modern ciphers that leverage hardware acceleration (AES-NI).

Check if your CPU supports AES-NI:

grep -o aes /proc/cpuinfo

If that returns empty, your host is running on ancient hardware. Move workloads immediately. All CoolVDS nodes run on modern processors with full instruction set pass-through.

Also, enable OCSP Stapling to save your users a round-trip to the Certificate Authority:

ssl_stapling on;
ssl_stapling_verify on;
resolver 1.1.1.1 8.8.8.8 valid=300s;
resolver_timeout 5s;

Summary

Optimizing an API Gateway is a game of millimeters. It requires aligning the kernel network stack, the Nginx worker architecture, and the underlying physical hardware.

In the context of the Norwegian market, where data sovereignty (Schrems II) pushes us away from US hyper-scalers and toward local infrastructure, you have the opportunity to own your stack completely. But with that ownership comes the responsibility to tune it.

Don't let IO wait or CPU steal time be the reason your SLA fails. Test these configurations on a CoolVDS NVMe instance today. We provide the raw power; you provide the logic.