Stop Letting Your Gateway Be the Bottleneck
It is March 2022. Your microservices respond in 15ms, but your client sees 85ms. Where did that extra time go? Usually, it vanishes into the black hole of an untuned API Gateway. If you are routing traffic through a default Nginx install or a vanilla Kong setup, you are effectively throttling your own infrastructure.
For developers targeting the Nordic market, physics is often the first enemy. If your users are in Oslo or Bergen, but your gateway sits in a Frankfurt datacenter, you are paying a 25-30ms round-trip tax before a single line of code executes. Combine that with the strict requirements of Schrems II and the Datatilsynet regarding data sovereignty, and hosting outside Norway becomes a liability—both technical and legal.
I have spent the last week debugging a fintech API that was bleeding latency. The fix wasn't rewriting the Java backend; it was tuning the Linux kernel and the gateway configuration. Here is the exact blueprint we used to drop overhead by 40%.
1. The OS Layer: Tuning the TCP Stack
Before touching Nginx, look at the kernel. Default Linux distros are tuned for general-purpose computing, not high-concurrency packet switching. When you hit 10,000 concurrent connections, the defaults choke.
We need to widen the TCP constraints. Specifically, we need to allow more pending connections and reuse sockets faster. Open your /etc/sysctl.conf.
Pro Tip: Never apply these blindly. Test them on a staging CoolVDS instance first. Our KVM virtualization passes these flags correctly to the kernel, unlike shared container environments where you are often locked out of sysctl.
Here is the production-grade configuration we applied:
# /etc/sysctl.conf configuration for API Gateways
# Maximize the backlog of incoming connections
net.core.somaxconn = 65535
# Increase the range of ephemeral ports for outgoing upstream connections
net.ipv4.ip_local_port_range = 1024 65535
# Enable TCP Fast Open to reduce handshake latency
net.ipv4.tcp_fastopen = 3
# Reuse connections in TIME_WAIT state (use with caution, but essential for gateways)
net.ipv4.tcp_tw_reuse = 1
# Increase the max number of open files
fs.file-max = 2097152Apply this with:
sysctl -pIf you don't increase fs.file-max, your gateway will hit a hard limit on open sockets, resulting in the dreaded "Too many open files" error during traffic spikes.
2. Nginx Upstream Keepalive: The Silent Killer
Most Nginx configurations I audit make a critical mistake: they open a brand new TCP connection to the backend service for every single request. This creates massive overhead due to the TCP 3-way handshake.
You must enable HTTP/1.1 keepalives to the upstream.
Here is the wrong way (standard proxy pass):
proxy_pass http://my_backend;Here is the correct way, utilizing the upstream module to keep connections hot:
upstream backend_microservice {
server 10.0.0.5:8080;
# Keep 64 idle connections open to this upstream
keepalive 64;
}
server {
location /api/v1/ {
proxy_pass http://backend_microservice;
# REQUIRED for keepalive to work
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}By setting proxy_set_header Connection "";, you clear the "Close" header usually sent by the client, allowing the connection between Nginx and your backend to persist. In our benchmarks on CoolVDS NVMe instances, this reduced internal latency from 4ms to roughly 0.6ms.
3. SSL/TLS Termination Efficiency
If you are terminating SSL at the gateway (which you should be), the CPU cost of the handshake is significant. In 2022, there is no excuse for using RSA keys for everything. ECDSA keys are computationally cheaper and faster.
However, the session cache is where you win the most performance. If a client reconnects, do not force a full handshake.
ssl_session_cache shared:SSL:10m;This allocates 10MB of shared memory for session tickets, storing roughly 40,000 sessions. Also, ensure you are prioritizing the right ciphers. Do not rely on defaults.
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384;
ssl_prefer_server_ciphers on;
ssl_session_timeout 1d;
ssl_session_tickets off; # Use session cache instead for better forward secrecy4. The Hardware Factor: Why "Cloud" Often Fails
You can tune software all day, but if your "vCPU" is fighting for cycles with 50 other noisy neighbors, your tail latency (p99) will be garbage. This is the hidden cost of cheap cloud providers.
In an API Gateway scenario, Steal Time is the metric to watch. If your hypervisor is delaying CPU scheduling, your Nginx workers stall. This is why we built CoolVDS on KVM with strict resource isolation. When you buy 4 vCPUs here, you get the cycles you paid for.
Furthermore, disk I/O matters for logging. If you are logging access logs to disk (even asynchronously), slow storage blocks the worker process. We mandate NVMe storage for all our nodes to prevent I/O wait from impacting throughput.
Norwegian Context: GDPR and Latency
Hosting in Norway isn't just about speed; it's about compliance. Since the Schrems II ruling, transferring personal data to US-owned clouds has become a legal minefield. By hosting on CoolVDS, your data physically resides in Oslo. You get the lowest latency to the NIX (Norwegian Internet Exchange) and full compliance with Norwegian privacy laws.
5. Quick-Fire Tuning Checklist
Before you deploy, run these checks:
- Did you disable access logging for static assets?
access_log off; - Is your worker count set to auto?
worker_processes auto; - Have you increased the worker connections?
worker_connections 4096;
Performance isn't an accident. It is architecture. If you are ready to stop fighting with latency and start serving requests, spin up a high-performance instance today.
Don't let slow hardware negate your software tuning. Deploy a test instance on CoolVDS in 55 seconds and see the difference raw NVMe power makes.