The 502 Bad Gateway Nightmare: It's Not Your Code, It's Your Config
Let’s be honest. There is nothing more embarrassing for a systems architect than watching latency spike to 800ms during a marketing push. You check the application logs—clean. You check the database load—minimal. Yet, client requests are timing out, and the marketing director is breathing down your neck.
If you are running a microservices architecture (and in 2016, who isn't trying to?), your API Gateway is the choke point. Most default Linux distributions and standard Nginx installs are configured for modest static file serving, not for proxying 10,000 concurrent API requests per second. Whether you are using Nginx, OpenResty, or HAProxy, the bottleneck is often the OS layer beneath it.
I recently audited a setup for a Norwegian e-commerce client expecting heavy traffic for Black Friday. Their gateway was running on a standard VPS provider (not CoolVDS, obviously). The hardware was fine, but the kernel dropped packets because the connection tracking table was full. Here is how we fixed it, and how you can tune your stack to survive the flood.
1. The Kernel is the Limit
By default, the Linux TCP stack is conservative. It prioritizes memory conservation over massive concurrency. When your API gateway acts as a reverse proxy, it opens two connections for every request: one to the client, and one to the upstream backend service. This exhausts ephemeral ports rapidly.
You need to edit /etc/sysctl.conf. If you are afraid of the terminal, stop reading now. These changes tell the kernel to reuse connections faster and allow more open files.
Key Sysctl Parameters
# /etc/sysctl.conf configuration for high-throughput API Gateways
# Increase the maximum number of open files (file descriptors)
fs.file-max = 2097152
# Increase the connection tracking table size (critical for firewalls)
net.netfilter.nf_conntrack_max = 1048576
# Allow reuse of sockets in TIME_WAIT state for new connections
# NOTE: Do NOT enable tcp_tw_recycle as it breaks NAT clients
net.ipv4.tcp_tw_reuse = 1
# Increase the range of ephemeral ports available
net.ipv4.ip_local_port_range = 1024 65535
# Increase queue length for incoming connections
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
Apply these with sysctl -p. If you are on a containerized system (like Docker 1.12), you might run into permission issues applying these from inside the container. This is why we prefer full hardware virtualization (KVM) at CoolVDS; you get full control over your kernel parameters without begging support to flip a switch on the host node.
2. Nginx: Stop Closing Upstream Connections
The biggest mistake I see in API Gateway configuration is treating upstream connections like standard HTTP clients. If you don't enable keepalives to your backend services, Nginx will open a new TCP handshake for every single API call. This adds massive overhead.
You must configure the upstream block to keep connections open. This drasticially reduces CPU usage on both the gateway and the microservice.
upstream backend_api {
server 10.0.0.5:8080;
server 10.0.0.6:8080;
# The secret sauce: keep 64 idle connections open per worker
keepalive 64;
}
server {
location /api/ {
proxy_pass http://backend_api;
# Required for keepalive to work
proxy_http_version 1.1;
proxy_set_header Connection "";
# Pass the real IP to the backend (crucial for logs)
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
Pro Tip: If you are using SSL/TLS (and you should be, even internally), the handshake overhead is expensive. Ensure your server CPU supports AES-NI instruction sets. CoolVDS instances expose the host CPU flags directly to the guest, ensuring OpenSSL can offload encryption tasks to the hardware.
3. The I/O Bottleneck: Access Logs
In a high-traffic scenario, writing to the disk is the slowest operation your server performs. If your API gateway logs every request to disk synchronously, your throughput is capped by your disk IOPS.
First, minimize what you log. Do you really need the user agent string for every internal health check? Second, use buffering.
# Buffer logs: Write to disk only when the buffer (64k) is full
# or every 5 minutes. This turns thousands of IOPS into one.
access_log /var/log/nginx/access.log main buffer=64k flush=5m;
The Storage Reality
Buffering helps, but eventually, data hits the metal. In 2016, many hosting providers in Europe are still pushing "Enterprise SAS" or standard SSDs with noisy neighbors. For an API gateway, input/output latency kills performance.
We benchmarked this. On a standard SATA SSD, high log volume caused CPU I/O wait times to spike to 15%. On NVMe storage (which we rolled out across our Oslo datacenter this year), I/O wait dropped to near zero. NVMe isn't just faster; it handles parallel command queues significantly better than AHCI-based SATA drives. When your database and logs are fighting for IOPS, NVMe is the only logical choice.
4. Geo-Latency and Data Sovereignty
With the invalidation of Safe Harbor last year and the impending EU privacy regulations (GDPR is coming, folks, get ready), where you host matters. If your primary user base is in Norway, hosting in Frankfurt or London adds 20-30ms of round-trip latency. That is technical debt.
Latency physics are immutable. Hosting in Oslo, connected directly to NIX (Norwegian Internet Exchange), ensures your API handshake times are minimal for local users.
| Metric | Generic Cloud (London) | CoolVDS (Oslo) |
|---|---|---|
| Ping from Stavanger | ~35ms | ~6ms |
| Data Jurisdiction | UK/US (Complex) | Norway (Strict/Safe) |
| Storage | Networked SSD (SAN) | Local NVMe |
Summary: Don't let the defaults win
Performance isn't accidental. It requires digging into sysctl, understanding TCP states, and choosing the right hardware infrastructure. A poorly tuned Nginx instance on a slow disk will fail, no matter how clean your Go or Node.js code is.
If you need a test environment where you can modify kernel parameters without restriction and utilize raw NVMe throughput, spin up a CoolVDS instance. We don't oversell, and we don't block your root access.
Ready to optimize? Deploy a KVM VPS in Oslo in under 55 seconds and see the difference in your wrk benchmarks.