API Gateway Performance Tuning: Squeezing Microseconds out of NGINX and Kong
If your API response time is hovering above 200ms, you are bleeding users. In the high-speed Norwegian market, where fiber penetration is among the highest in Europe, latency isn't just an annoyance; it's a bug. I recently audited a payment processing cluster in Oslo where the development team blamed the database for slow transactions. The database was fine. The bottleneck was a default-configured NGINX instance choking on SSL handshakes and TCP connection overhead.
Most VPS providers hand you a server with generic settings designed for stability on low-end hardware, not for high-throughput API traffic. Today, we are going to rip apart those defaults. We will tune the Linux kernel, optimize the NGINX worker processes, and discuss why hardware architecture—specifically NVMe storage—is the ceiling you cannot break through with software alone.
The Hidden Killer: TCP Stack Limits
Before touching the application layer, look at the kernel. Linux defaults are often conservative to save memory. For an API gateway handling thousands of concurrent connections, these defaults cause dropped packets and connection resets.
One specific metric I obsess over is the ephemeral port range and the TIME_WAIT state. When your gateway connects to upstream microservices, it opens a local port. If you run out of ports, the server hangs. You need to allow the system to reuse sockets faster.
Check your current range:
sysctl net.ipv4.ip_local_port_range
If it looks like 32768 60999, you are limiting yourself. Here is the production-grade sysctl.conf configuration I deploy on CoolVDS instances running high-load API gateways.
Production sysctl.conf Optimization
# /etc/sysctl.conf
# Maximize the number of open file descriptors
fs.file-max = 2097152
# Increase the ephemeral port range
net.ipv4.ip_local_port_range = 1024 65535
# Reuse connections in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1
# Increase the maximum number of connections in the backlog
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 16384
# Increase memory buffers for TCP
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# Disable slow start after idle to prevent latency spikes on keepalive connections
net.ipv4.tcp_slow_start_after_idle = 0
Apply these changes with sysctl -p. The last setting, tcp_slow_start_after_idle, is crucial for long-lived API connections (like WebSocket or gRPC streams). Without it, the kernel unnecessarily throttles bandwidth after a brief pause in data transmission.
NGINX / OpenResty Tuning
Whether you are using raw NGINX, Kong, or OpenResty, the core engine is the same. The most common mistake I see is leaving upstream connections to close after every request. Establishing a TCP handshake and negotiating TLS takes time—latency that adds up linearly.
You must enable keepalive connections to your upstreams. This turns your gateway into a persistent pipe rather than a connection factory.
Optimized Upstream Configuration
http {
# ... other settings ...
upstream backend_api {
# The microservice application
server 10.0.0.5:8080;
server 10.0.0.6:8080;
# KEEP THIS ALIVE.
# Defines the maximum number of idle keepalive connections
# to upstream servers that are preserved in the cache of each worker process.
keepalive 64;
}
server {
listen 443 ssl http2;
server_name api.example.no;
# SSL Optimization for speed
ssl_session_cache shared:SSL:50m;
ssl_session_timeout 1d;
ssl_session_tickets off;
location / {
proxy_pass http://backend_api;
# essential for keepalive to work
proxy_http_version 1.1;
proxy_set_header Connection "";
# Buffer tuning for throughput
proxy_buffers 16 16k;
proxy_buffer_size 32k;
}
}
}
Pro Tip: Always useproxy_http_version 1.1;and clear the Connection header. If you don't, NGINX defaults to HTTP/1.0, which closes the connection explicitly, rendering yourkeepalivedirective useless.
The Hardware Bottleneck: Why IOPS Matter
You can optimize software until you are blue in the face, but you cannot code your way out of slow hardware. In 2019, deploying a database or a logging-heavy API gateway on standard SATA SSDs—or worse, spinning HDDs—is professional negligence. This is especially true when using containerized environments (Docker/Kubernetes) where overlay filesystems add their own overhead.
When an API gateway logs a request, it writes to disk. If the disk queue is full (high I/O Wait), the CPU pauses to wait for the write to confirm. This is "Steal Time" in virtualized environments.
At CoolVDS, we strictly use NVMe storage arrays. The difference in random read/write speeds is not marginal; it is exponential. NVMe queues are designed for parallelism, matching the multi-core nature of modern CPUs.
To verify your disk speed, don't just use dd. Use fio to simulate random I/O patterns typical of API logs:
fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite --bs=4k --direct=1 --size=512M --numjobs=2 --runtime=240 --group_reporting
Testing the Configuration
Do not guess. Measure. Use wrk to load test your endpoint. Here is a Lua script to simulate a POST request with a JSON payload, which is more realistic than a simple GET.
Load Test Script (post_test.lua)
wrk.method = "POST"
wrk.body = '{"foo": "bar", "baz": 123}'
wrk.headers["Content-Type"] = "application/json"
-- Run with: wrk -t12 -c400 -d30s -s post_test.lua https://api.yoursite.no
Running this against a standard VPS usually yields around 2,000-3,000 requests per second (RPS). On a tuned CoolVDS instance with the kernel parameters above, we consistently see numbers north of 15,000 RPS, depending on the complexity of the upstream logic.
Security vs. Performance: The Norwegian Context
Performance cannot come at the cost of compliance. Hosting in Norway means adhering to strict Datatilsynet guidelines. The advantage of using a local provider like CoolVDS is data sovereignty. Your logs and traffic stay within Norwegian jurisdiction, routing through NIX (Norwegian Internet Exchange) in Oslo rather than bouncing through Frankfurt or Stockholm. This reduces round-trip time (RTT) by 10-20ms for local users.
However, logging every request for GDPR compliance hits I/O hard. This is why we disable access logs for static assets and only log API endpoints, often piping them asynchronously to a separate logging server to avoid blocking the main thread.
access_log off; # For static assets
access_log /var/log/nginx/api_access.log main buffer=32k flush=1m; # Buffered logging
Conclusion
Latency is a stack-wide problem. It starts with the physics of your storage, moves up to the kernel's TCP stack, and ends with your NGINX configuration. If you ignore any layer, you create a bottleneck.
If you are tired of fighting "noisy neighbors" and sluggish I/O on public clouds, it is time to upgrade your infrastructure. You need dedicated resources and NVMe storage that can keep up with your optimized code.
Don't let slow I/O kill your SEO. Deploy a test instance on CoolVDS in 55 seconds and run the fio benchmark yourself.