The 502 Bad Gateway Nightmare: It's Not Your Code, It's Your Infrastructure
I still remember the silence in the Slack war room. It was Black Friday, 2023. We had scaled our microservices, optimized our database queries, and cached everything in Redis. Yet, the dashboard was bleeding red. 502 Bad Gateway. 504 Gateway Time-out. The load balancer wasn't crashing—it was choking.
We weren't hitting a CPU limit; we were hitting a file descriptor limit. We were exhausting the ephemeral port range. The infrastructure, specifically the API Gateway layer, wasn't tuned for the concurrency we threw at it. Most VPS providers hand you a vanilla Linux image and wish you luck. That works for a WordPress blog. It fails catastrophically for a high-throughput API serving users from Oslo to Tromsø.
In this guide, we are going to bypass the fluff. We are going into the kernel. We are editing sysctl.conf. We are tuning NGINX worker processes. And we are going to discuss why running this on a standard container-based VPS is a fool's errand compared to a dedicated KVM slice like those at CoolVDS.
1. The Foundation: Linux Kernel Tuning
Before touching NGINX or Kong, look at the OS. By default, Linux is configured for general-purpose computing, not high-performance packet shuffling. If you deploy an API gateway on a fresh install, you are capped before you start.
Increase File Descriptors
Every connection is a file. The default limit is often 1024. For an API gateway, you need at least 65535.
ulimit -n 65535
But that's temporary. Make it permanent in /etc/security/limits.conf:
* soft nofile 65535
* hard nofile 65535
root soft nofile 65535
root hard nofile 65535
TCP Stack Optimization
This is where the magic happens. We need to allow the system to reuse sockets quickly and handle a massive backlog of incoming connections. Open /etc/sysctl.conf and add these lines:
# Allow reuse of sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1
# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535
# Increase the maximum number of connections in the backlog
net.core.somaxconn = 65535
# Increase the max number of backlog connections
net.core.netdev_max_backlog = 16384
# TCP Fast Open (TFO) to reduce handshake latency
net.ipv4.tcp_fastopen = 3
Apply changes with sysctl -p.
Pro Tip: Many "cloud" providers run containerized VPS solutions (like OpenVZ or LXC) that share a kernel with the host. This means you often cannot modify these sysctl parameters because you don't own the kernel. This is a dealbreaker for API Gateways. At CoolVDS, we use KVM virtualization exclusively. You get your own kernel. You can tune tcp_tw_reuse until your heart's content. That is the difference between a toy server and professional hosting.
2. NGINX Configuration: Beyond the Defaults
Whether you use raw NGINX, OpenResty, or Kong, the underlying engine is NGINX. The default nginx.conf is conservative.
Worker Processes and Connections
Set worker_processes to auto to map to your CPU cores. But crucially, bump the worker_connections. If you have a 4-core CoolVDS instance, you want to utilize every cycle.
events {
worker_connections 10240;
use epoll;
multi_accept on;
}
Keepalive Connections
API Gateways often talk to upstream backend services. Establishing a new SSL handshake for every request to your internal microservice kills performance. Use keepalive to maintain persistent connections to the upstream.
upstream backend_api {
server 10.0.0.5:8080;
keepalive 64;
}
server {
location /api/ {
proxy_pass http://backend_api;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}
Note: You must clear the Connection header, or NGINX will close the connection to the backend by default.
3. SSL/TLS: The CPU Eater
Decryption is expensive. If your API Gateway handles thousands of SSL handshakes per second, your CPU usage will spike.
- Enable HTTP/2 and HTTP/3 (QUIC): Reduce head-of-line blocking.
- Session Resumption: Cache SSL session parameters to avoid full handshakes on reconnects.
- OCSP Stapling: Saves the client from checking the certificate revocation status elsewhere.
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
ssl_stapling on;
ssl_stapling_verify on;
4. The Norwegian Factor: Latency and Compliance
Why does physical location matter in 2024? Light speed is finite. If your primary user base is in Norway, hosting your API Gateway in Frankfurt or Amsterdam adds a 15-30ms round-trip tax to every request. For a complex API call involving multiple handshakes, that adds up to noticeable lag.
Furthermore, the Norwegian Datatilsynet is rigorous regarding GDPR and data sovereignty (Schrems II). Hosting data within Norwegian borders simplifies compliance for local enterprises.
Comparison: Latency to Oslo (Average)
| Server Location | Latency to Oslo | Jurisdiction |
|---|---|---|
| CoolVDS (Oslo) | < 2 ms | Norway (Non-EU/EEA but GDPR aligned) |
| Frankfurt (Hyperscalers) | ~25 ms | Germany (EU) |
| US East (Virginia) | ~95 ms | USA (Cloud Act risk) |
5. Storage I/O: The Silent Bottleneck
API Gateways log heavily. Access logs, error logs, audit trails. If you are writing 5,000 log lines per second to a standard SATA SSD (or worse, a network-attached block storage with low IOPS), your disk wait times will block the NGINX worker processes.
You have two options:
- Buffer logging in memory: Risky if the server crashes (logs are lost).
- Use NVMe Storage: The write speeds on NVMe drives are exponentially higher than SATA.
At CoolVDS, we don't upsell NVMe as a "premium" tier. It's the standard. We know that in 2024, spinning rust has no place in a production environment. High I/O throughput ensures your logging never blocks your request processing.
Conclusion: Control Your Environment
Performance tuning is about removing bottlenecks. First, you remove the artificial limits of the Linux kernel. Next, you optimize the application configuration. Finally, you ensure the hardware itself isn't working against you.
You can apply every config in this guide, but if your neighbor on a shared host decides to mine crypto, your API latency will suffer. Control your latency. Own your kernel. Keep your data in Norway.
Ready to drop your API latency to single digits? Deploy a KVM-based, NVMe-powered instance on CoolVDS today.