Your API Gateway is Choking (And It's Not the Code's Fault)
Let's cut the pleasantries. If you are running microservices in 2020, you aren't connecting service-to-service directly. You have a gatekeeper. Maybe it's Kong, maybe it's a raw NGINX reverse proxy, or perhaps Traefik. You look at your New Relic dashboards and see application response times in the green, yet the client-side latency is dragging.
I see this every week. A CTO calls me, swearing their Node.js or Go services are optimized to the bone, yet they are bleeding milliseconds. 9 times out of 10, the bottleneck isn't the application logic. It's the gateway configuration and the underlying Linux OS limits.
We recently audited a high-frequency trading bot hosted here in Norway. They were losing money because their API Gateway—handling 20,000 requests per second (RPS)—was effectively DDoS-ing itself. The connection overhead was eating 40% of their CPU cycles. Here is how we fixed it, and how you can tune your CoolVDS instance to handle similar loads.
1. The "File Descriptor" Lie
By default, most Linux distros ship with conservative limits. Ubuntu 20.04 is better than its predecessors, but it still isn't ready for high-concurrency routing out of the box. If your gateway hits the nofile limit, it stops accepting new connections silently. It doesn't crash; it just stalls.
Check your current limits:
ulimit -n
If that returns 1024, you are in trouble. For a gateway on a CoolVDS production node, we need to crank this up at the OS level.
Edit /etc/security/limits.conf:
* soft nofile 65535
* hard nofile 65535
root soft nofile 65535
root hard nofile 65535
But that's just the user space. You need to tell NGINX (or your gateway of choice) to actually utilize them. Inside your nginx.conf, at the main context level:
worker_rlimit_nofile 65535;
events {
worker_connections 16384;
use epoll;
multi_accept on;
}
2. Kernel Tuning for High TCP Turnover
API Gateways are distinct from standard web servers because they open two connections for every request: one to the client, one to the upstream service. This doubles the pressure on the ephemeral port range. You will run out of ports quickly, leading to TIME_WAIT exhaustion.
We need to modify the sysctl.conf to recycle connections faster. This is critical for keeping latency low between your CoolVDS instance and external APIs or internal microservices.
Add this to /etc/sysctl.conf:
# Allow reuse of sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1
# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535
# Max number of packets in the receive queue
net.core.netdev_max_backlog = 16384
# Max SYN backlog (connections waiting for ACK)
net.ipv4.tcp_max_syn_backlog = 8192
# Max connections in listen queue
net.core.somaxconn = 8192
Load it with sysctl -p. These settings are aggressive. They assume you have the CPU power to handle the interrupts. This is why we insist on KVM virtualization at CoolVDS—we don't oversell CPU cycles. If you try this on a cheap OpenVZ container where the host kernel is shared and overloaded, you might actually degrade performance.
Pro Tip: Do not enabletcp_tw_recycle. It was removed in newer kernels and breaks connections for users behind NAT (like mobile networks). Stick totcp_tw_reuse.
3. The Upstream Keepalive Mistake
This is the most common configuration error I see in NGINX. By default, NGINX acts as a polite HTTP/1.0 client to your backend services: it opens a connection, sends the request, gets the response, and closes the connection.
If you are routing to a local Node.js service or a database API, the TLS handshake and TCP setup for every single request adds massive latency. You need to keep the connection open.
upstream backend_api {
server 10.0.0.5:8080;
# Keep 64 idle connections open to this upstream
keepalive 64;
}
server {
location /api/ {
proxy_pass http://backend_api;
# Required for keepalive to work
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}
Without the empty Connection header, NGINX forwards the "close" header by default, killing the keepalive.
4. I/O Latency: The Silent Killer
An API Gateway logs everything. Access logs, error logs, audit trails. If you are pushing 5,000 requests per second, you are writing to disk 5,000 times per second. If your disk I/O blocks, the NGINX worker process blocks. If the worker blocks, no requests get processed.
This is where hardware architecture becomes political. Many "budget" VPS providers in Europe still run on SSDs that are network-attached (Ceph or similar) with high latency, or worse, shared spindles.
For a gateway, you must buffer your logs to avoid disk locking. Modify your access log directive:
access_log /var/log/nginx/access.log main buffer=32k flush=5s;
This tells NGINX: "Wait until you have 32kb of data or 5 seconds have passed before physically writing to the disk."
The Hardware Reality
Even with buffering, eventually, you have to write to the disk. On CoolVDS, we use local NVMe storage. The I/O throughput is massive compared to standard SSDs. When your log buffer flushes, it happens instantly. On a legacy hosting platform, that flush operation could stall your API for 50-100ms. In the world of high-frequency trading or real-time bidding, that is an eternity.
5. Local Nuances: Norway and GDPR
Why host this gateway in Norway? Beyond the obvious legal benefits of keeping data within the EEA (especially with the uncertainty surrounding Privacy Shield right now), there is the physics of the network.
If your users are in Oslo, Bergen, or Trondheim, routing traffic through Frankfurt adds 20-30ms of round-trip time. Routing through a US cloud provider adds 100ms+. By placing your API Gateway on a CoolVDS instance in Oslo, you are peering directly at NIX (Norwegian Internet Exchange).
Low latency isn't just about speed; it's about SEO (Core Web Vitals are becoming a ranking factor) and user retention. A fast gateway makes your entire architecture feel snappy, even if the backend is heavy.
Summary
Optimizing an API Gateway is an exercise in removing hurdles. You remove the file descriptor limits, you remove the TCP handshake overhead, and you remove the disk I/O blocking.
- Kernel: Tune
sysctlfor high concurrency. - NGINX: Enable
keepaliveto upstreams. - Storage: Buffer logs and demand NVMe.
Don't let a default configuration file dictate your performance ceiling. Deploy a high-performance instance on CoolVDS today, SSH in, and apply these configs. You'll see the difference in your p99 latency metrics immediately.