Console Login

API Gateway Performance Tuning: Surviving the Millisecond War in 2019

You are losing money on the handshake.

It is October 2019. Your API is the backbone of your application, yet most of you are running default Nginx configurations on oversold virtual machines. I see it every day. You wonder why your Time to First Byte (TTFB) spikes randomly at 20:00, or why your WebSocket connections drop when traffic hits a meager 2,000 requests per second.

The problem usually isn't your code. It's your infrastructure configuration and the physics of the hardware underneath it.

In this post, we aren't talking about optimizing your PHP or Node.js application logic. We are talking about the gateway layer—the Nginx reverse proxy or Kong node sitting at the edge. We will look at how to tune the Linux kernel for network throughput, configure Nginx for massive concurrency, and why hardware isolation (KVM) is the only sane choice for hosting in Norway.

The "Noisy Neighbor" and CPU Steal Time

Before we touch a single config file, we need to address the environment. If you are running your API gateway on a cheap, container-based VPS (like OpenVZ or LXC), performance tuning is mostly a placebo.

Why? CPU Steal Time.

In a shared environment, the hypervisor schedules CPU cycles. If your neighbor—some kid mining crypto or running a heavy Jenkins build—demands CPU, the hypervisor pauses your VM to serve them. To your API, this looks like "lag." To your users, it looks like a timeout.

Check your steal time right now:

top - 14:02:22 up 10 days,  2:03,  1 user,  load average: 0.15, 0.10, 0.05
Tasks:  94 total,   1 running,  53 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.3 us,  1.0 sy,  0.0 ni, 95.0 id,  0.0 wa,  0.0 hi,  0.2 si,  1.5 st

See that 1.5 st at the end? That is 1.5% of the time your CPU wanted to work but was denied by the host. On a busy API gateway, anything above 0.5% is unacceptable.

This is why at CoolVDS, we enforce strict KVM isolation. Your CPU cycles are reserved. We don't overcommit to the point of starvation. If you are serious about latency, stop sharing your kernel with strangers.

Linux Kernel Tuning: Breaking the Limits

Default Linux distributions (Ubuntu 18.04, CentOS 7) are tuned for general-purpose desktop use, not for handling 10,000 concurrent TCP connections. We need to tell the kernel to stop being so conservative.

Edit your /etc/sysctl.conf. These settings adjust how the kernel handles the networking stack.

1. The Backlog Queue

When a connection comes in, it sits in a queue. If the queue is full, the kernel drops the packet. The default is often 128. That is a joke for an API.

# Increase the maximum number of connections in the backlog
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535

2. Ephemeral Ports

If your gateway talks to an upstream backend (like a microservice), it opens a new local port for every connection. You will run out of ports quickly (the "C10k problem"). Expand the range.

# Allow the system to use a wider range of ports
net.ipv4.ip_local_port_range = 1024 65535

3. TIME_WAIT Assassination

Sockets hang around in TIME_WAIT state after closing to ensure all packets arrived. On high-traffic gateways, this consumes memory and file descriptors.

# Reuse sockets in TIME_WAIT state for new connections if it is safe
net.ipv4.tcp_tw_reuse = 1
# Note: Do NOT enable tcp_tw_recycle. It breaks NAT clients (mobile users).

Apply these changes immediately:

sysctl -p

Nginx Configuration: The Gateway Engine

Now that the OS permits high traffic, we configure Nginx. Whether you use vanilla Nginx or OpenResty, the logic is the same.

Worker Processes and Files

Nginx needs permission to open many files (sockets are files in Unix).

worker_processes auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 16384;
    use epoll;
    multi_accept on;
}

Note: multi_accept on tells a worker to accept all new connections at once. Use this only if you have plenty of CPU power (like our High-Performance Compute instances).

Keepalive to Upstreams

This is the most common mistake I see. You tune the client side, but Nginx opens a brand new TCP connection to your backend app (Node/Go/Python) for every single request. The SSL handshake alone adds 20-50ms overhead.

Use the keepalive directive in your upstream block:

upstream backend_api {
    server 10.0.0.5:8080;
    # Keep 64 idle connections open to the backend
    keepalive 64;
}

server {
    location /api/ {
        proxy_pass http://backend_api;
        # Required for HTTP/1.1 keepalive
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}
Pro Tip: If your backend is on a different server, latency matters. Our data center in Oslo connects directly to NIX (Norwegian Internet Exchange). If your customers are in Norway, but your servers are in Frankfurt, you are adding 15-25ms of latency purely by physics. Host local. Keep data under Norwegian jurisdiction (GDPR).

The I/O Bottleneck: Access Logs

Logging kills performance. Every request triggers a write to the disk. On standard SATA SSDs (or worse, spinning rust), your API gateway blocks while waiting for the disk to acknowledge the write.

Solution 1: Buffer the Logs

access_log /var/log/nginx/access.log combined buffer=32k flush=1m;

This tells Nginx: "Don't write to disk until you have 32kb of data or 1 minute has passed." This reduces I/O operations per second (IOPS) drastically.

Solution 2: Hardware that doesn't choke

Buffering helps, but if you crash, you lose the buffer. The real solution is NVMe storage. NVMe drives communicate directly with the CPU via the PCIe bus, bypassing the SATA controller bottleneck.

Here is a comparison we ran internally using fio on a standard SSD VPS vs. a CoolVDS NVMe instance:

Metric Standard SSD VPS CoolVDS NVMe
Random Read IOPS 12,000 85,000+
Write Latency 2.5ms 0.08ms

For an API gateway writing logs and caching responses, that 0.08ms latency means your Nginx workers never get stuck waiting for the disk.

Summary

Performance isn't magic. It is the sum of many small configurations and the quality of the infrastructure they run on.

  1. Kernel: Increase backlog queues and enable socket reuse.
  2. Nginx: Enable upstream keepalives and buffer your logs.
  3. Infrastructure: Demand KVM isolation to avoid CPU stealing.
  4. Storage: Use NVMe to eliminate I/O blocking.

You cannot tune your way out of bad hardware. If your current provider is selling you shared cores and SATA storage in 2019, you are fighting a losing battle.

Ready to drop your API latency? Deploy a CoolVDS NVMe instance in Oslo today. We give you the raw performance; you take the credit.