Zero-Compromise API Gateway Tuning: From Kernel Panic to 50k RPS
If your API Gateway adds more than 5 milliseconds of latency to a request, you have already failed. In the microservices architecture prevalent in 2024, the gateway is the choke point. I have seen perfectly written Go and Rust services sitting idle because the frontend gateway was choking on SSL handshakes or exhausted file descriptors. It is not usually a code problem; it is a configuration and infrastructure negligence problem.
Most developers treat the Gateway (whether it's NGINX, Kong, or Traefik) as a black box. They deploy a Docker container, expose port 443, and wonder why 502 Bad Gateway errors spike during Black Friday sales. I recall a specific incident last year with a major e-commerce client based in Oslo. They were routing traffic through a generic cloud load balancer in Frankfurt. The round-trip time (RTT) alone was eating 35ms. By moving the edge termination to a CoolVDS instance in Norway and tuning the TCP stack, we dropped total request time by 60%. Physics always wins.
1. The Foundation: Kernel & TCP Stack Tuning
Before touching the application layer, you must fix the OS. Linux defaults are designed for general-purpose computing, not for handling 50,000 concurrent connections. If you don't tune the kernel, your fancy API Gateway is running with one hand tied behind its back.
The first limit you will hit is the file descriptor limit. Everything in Linux is a file, including a TCP connection.
Check your current limits with:
ulimit -n
If it returns 1024, you are in trouble. Here is the baseline /etc/sysctl.conf configuration I deploy on every high-performance edge node. This optimizes the ephemeral port range, enables TCP fast open, and adjusts buffer sizes for modern 10Gbps+ networks.
The "Battle-Ready" Sysctl Config
# /etc/sysctl.conf - Optimized for High Concurrency (Jan 2024)
# Increase system-wide file descriptor limit
fs.file-max = 2097152
# Widen the port range to allow more concurrent connections
net.ipv4.ip_local_port_range = 10000 65535
# Reuse sockets in TIME_WAIT state for new connections
# Critical for high-throughput API gateways talking to upstream backends
net.ipv4.tcp_tw_reuse = 1
# Increase the maximum number of connections in the backlog
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
# Optimize TCP window sizes for high-bandwidth, low-latency links
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# Enable TCP Fast Open (RFC 7413) to reduce handshake latency
net.ipv4.tcp_fastopen = 3
# Protection against SYN floods without blocking legitimate traffic
net.ipv4.tcp_syncookies = 1
# BBR Congestion Control - Essential for unstable client networks
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr
Apply these changes instantly without rebooting:
sysctl -p
Pro Tip: On virtualized hardware, "Steal Time" (st) is the silent killer. If your hosting provider over-provisions CPUs, your kernel tuning won't matter because the hypervisor isn't giving you cycles to process the packets. This is why we standardize on CoolVDS KVM instances; the isolation ensures that when ksoftirqd needs CPU, it gets it immediately.
2. NGINX Configuration: Beyond the Defaults
Whether you use raw NGINX, OpenResty, or Kong, the underlying engine is likely NGINX. The default nginx.conf is garbage for an API Gateway role. We need to focus on Keepalives and Worker Rlimits.
When NGINX acts as a reverse proxy, it opens a connection to the client and a separate connection to the upstream service. If you do not enable keepalives to the upstream, NGINX will open and close a new TCP connection for every single API call. This exhausts ephemeral ports and burns CPU on handshakes.
Optimized NGINX Context
user www-data;
worker_processes auto;
# This directive is crucial. It must match or exceed the file limit set in sysctl.
worker_rlimit_nofile 65535;
events {
# epoll is standard for Linux 2.6+
use epoll;
# Allow a worker to accept all new connections at once
multi_accept on;
# Maximum connections per worker
worker_connections 16384;
}
http {
# ... standard mime types ...
# OPTIMIZATION: Buffer Sizes
# Don't buffer large bodies to disk if you can avoid it.
# RAM is cheap on CoolVDS, disk I/O (even NVMe) is slower than RAM.
client_body_buffer_size 128k;
client_max_body_size 10m;
client_header_buffer_size 1k;
large_client_header_buffers 4 4k;
output_buffers 1 32k;
postpone_output 1460;
# OPTIMIZATION: Keepalive to Upstreams
upstream backend_api {
server 10.0.0.5:8080;
server 10.0.0.6:8080;
# Keep 64 idle connections open to the backend
keepalive 64;
}
server {
listen 443 ssl http2;
server_name api.example.no;
# SSL Settings (See Section 3)
# ...
location / {
proxy_pass http://backend_api;
# REQUIRED for keepalive to work
proxy_http_version 1.1;
proxy_set_header Connection "";
# Pass real IP (Essential for logging/rate-limiting)
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
}
To verify your configuration syntax before reloading:
nginx -t
If you miss the proxy_set_header Connection ""; directive, NGINX will default to closing the connection to the backend, rendering the keepalive directive useless. This is a common mistake that caps throughput significantly.
3. The Encryption Tax: SSL/TLS Optimization
SSL termination is CPU intensive. In 2024, there is no excuse for using RSA keys for new deployments. Elliptic Curve (ECDSA) keys are smaller and computationally faster to sign, meaning less latency during the handshake.
Furthermore, ensure you are using OCSP Stapling. This allows the server to present a valid revocation timestamp during the handshake, preventing the client's browser from needing to do a DNS lookup to the Certificate Authority. This saves a massive amount of time, especially for mobile users on 4G/5G networks.
Modern SSL Configuration Block
# TLS 1.3 is mandatory for performance in 2024
ssl_protocols TLSv1.2 TLSv1.3;
# Prioritize server ciphers
ssl_prefer_server_ciphers on;
# ECDSA-optimized cipher suite
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
# Session Cache - Shared between workers
ssl_session_cache shared:SSL:50m;
ssl_session_timeout 1d;
ssl_session_tickets off;
# OCSP Stapling
ssl_stapling on;
ssl_stapling_verify on;
resolver 1.1.1.1 8.8.8.8 valid=300s;
resolver_timeout 5s;
Test your SSL handshake latency using OpenSSL:
openssl s_time -connect api.example.no:443 -new
Hardware & Geography: The Unfixable Variable
You can optimize software until you are blue in the face, but you cannot overcome the speed of light. If your target market is Norway, hosting your API Gateway in the US or even Central Europe introduces unavoidable latency. The "Ping" is the baseline floor of your performance.
Additionally, data sovereignty laws in Europe (GDPR) and specific rulings by Datatilsynet make storing and processing data within Norwegian borders legally advantageous. But from a purely technical standpoint, proximity is king.
| Factor | Generic Cloud (Frankfurt/London) | CoolVDS (Oslo) |
|---|---|---|
| Latency to Oslo | 25ms - 45ms | < 3ms |
| Storage I/O | Shared Network Storage (Variable) | Local NVMe (Consistent) |
| CPU Access | High Steal % (Noisy Neighbors) | Dedicated Resources |
Logging is another hidden I/O bottleneck. High-traffic gateways write massive access logs. If your disk write speed is slow, NGINX workers block waiting for I/O. We equip CoolVDS instances with NVMe specifically to handle this write-heavy pattern without impacting the read operations of the application.
Check your disk write latency to confirm if this is a bottleneck:
ioping -c 10 .
Final Thoughts
Performance is not an accident; it is engineered. By tuning the Linux kernel to handle high connection counts, configuring your Gateway to properly reuse connections, and hosting on hardware that respects physical proximity and I/O requirements, you build a system that stays up when others melt.
Don't let a default config file be the reason your API feels slow. Deploy a test environment on CoolVDS, apply these configs, and measure the difference. Speed is the only feature that cannot be faked.