The Bottleneck is (Probably) Not Your Microservice
It is July 2018. GDPR has been live for two months, and the panic has settled into a dull headache. You have broken your monolith into microservices, containerized them with Docker, and slapped an API Gateway in front. Now, you are staring at Grafana. The 99th percentile latency just spiked to 400ms. Your Norwegian fintech client is screaming about "trege systemer" (slow systems).
Most DevOps engineers instinctively scale out. They spin up more droplets, more instances. That is lazy. Often, the problem isn't lack of CPUβit is a choked network stack or I/O wait. I have debugged clusters where the application code was blazing fast, but the gateway was dropping packets because of default Linux settings from 2010.
If you are running high-throughput APIs on standard VPS hosting in Europe, you need to get your hands dirty with kernel flags and NGINX configs. Here is how we tune the stack at CoolVDS to handle the load without melting down.
1. The File Descriptor Trap
Linux treats everything as a file. A TCP connection is a file. The default limit for open files on many distros (like Ubuntu 16.04 or 18.04) is often set to 1024. That is laughable for an API gateway.
When you hit connection #1025, your gateway doesn't slow down; it crashes or rejects connections. You will see Too many open files in your logs. Fix this first.
Check your current limits:
ulimit -n
If it returns 1024, you have work to do. Edit /etc/security/limits.conf to raise the ceiling for your web user (usually www-data or nginx):
* soft nofile 65535
* hard nofile 65535
root soft nofile 65535
root hard nofile 65535
Pro Tip: Setting this in the OS isn't enough if you use NGINX. You must explicitly tell NGINX to use these descriptors.
2. NGINX: The Worker Configuration
I see this mistake constantly. Engineers leave nginx.conf on default settings. NGINX is an event-based server, meaning it doesn't spawn a thread per connection (unlike Apache). However, it is constrained by worker_connections.
Here is the reference configuration we deploy on CoolVDS high-performance instances:
user www-data;
worker_processes auto; # Automatically detects your CPU cores
pid /run/nginx.pid;
# This is crucial. Must match or exceed ulimit -n
worker_rlimit_nofile 65535;
events {
# Determines how many connections one worker can handle.
# Total max connections = worker_processes * worker_connections
worker_connections 16384;
# efficient connection processing method for Linux
use epoll;
# Accept as many connections as possible, immediately
multi_accept on;
}
Without multi_accept on, a worker will accept one new connection at a time. Under a DDoS attack or a marketing push, you want that worker grabbing connections as fast as the kernel hands them over.
3. Kernel Tuning: The Sysctl Layer
The Linux network stack was designed for reliability over WANs, not for massive throughput between microservices inside a datacenter. We need to tweak the TCP stack. This is done in /etc/sysctl.conf.
A common issue is running out of ephemeral ports. When your gateway connects to an upstream service, it opens a local port. If you churn through connections too fast, you hit the limit (usually ~28,000 ports) and get TIME_WAIT exhaustion.
Apply these settings to widen the highway:
# Allow reusing sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1
# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535
# Max number of packets in the receive queue
net.core.netdev_max_backlog = 16384
# Max number of connections queued in the kernel
net.core.somaxconn = 8192
# Increase TCP buffer sizes for 10Gbps links (common in 2018 datacenters)
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
Load them instantly:
sysctl -p
Warning: Do not enablenet.ipv4.tcp_tw_recycle. It was deprecated in recent Linux kernels and breaks connections coming from behind NAT devices (like mobile phones on 4G networks). Stick totcp_tw_reuse.
4. The Hardware Reality: NVMe vs. Spinning Rust
You can tune software all day, but if your disk I/O is slow, your logs will block your request processing. NGINX writes access logs and error logs. If you are logging every API request to a standard HDD (Hard Disk Drive), the disk head seeks will throttle your throughput.
In 2018, SSDs are standard, but NVMe (Non-Volatile Memory Express) is the differentiator. NVMe connects directly via the PCIe bus, bypassing the SATA bottleneck.
Benchmark Comparison (Random Read/Write)
| Storage Type | IOPS (Approx) | Latency |
|---|---|---|
| Standard HDD (7200 RPM) | 80 - 120 | ~15 ms |
| SATA SSD | 5,000 - 80,000 | ~0.2 ms |
| CoolVDS NVMe | 300,000+ | ~0.03 ms |
If you are aggregating logs or using a local database cache on the gateway, NVMe is mandatory. Check your disk wait time with iostat:
iostat -x 1
If %iowait is consistently above 5%, your storage is the bottleneck. At CoolVDS, our infrastructure is built purely on enterprise NVMe arrays to eliminate this variable entirely.
5. Upstream Keepalives: The Silent Killer
By default, NGINX acts as a polite browser: it opens a connection to your backend service, sends the request, gets the reply, and closes the connection. This requires a full TCP handshake (SYN, SYN-ACK, ACK) for every single API call.
For an API gateway handling 5,000 req/sec, that is 15,000 unnecessary packets per second just for handshakes. It destroys latency.
Enable keepalives to your upstreams:
upstream backend_api {
server 10.0.0.5:8080;
server 10.0.0.6:8080;
# Keep 64 idle connections open per worker
keepalive 64;
}
server {
location /api/ {
proxy_pass http://backend_api;
# Required for keepalive to work
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}
This simple change reduced latency by 35ms in a recent project involving a booking system connected to the NIX (Norwegian Internet Exchange).
6. Local Nuances: Norway, GDPR, and Latency
Hosting in Norway offers unique advantages in 2018. Since the implementation of GDPR in May, data sovereignty is critical. While Norway is not in the EU, it is in the EEA, making it fully GDPR compliant. However, physical proximity matters.
If your users are in Oslo, Stavanger, or Bergen, routing traffic through Frankfurt or Amsterdam adds 20-30ms of round-trip time. Light speed is finite.
Test your latency to your current provider:
curl -w "Connect: %{time_connect} TTFB: %{time_starttransfer} Total: %{time_total}\n" -o /dev/null -s https://your-api.com
If your time_connect is over 0.050s (50ms) from a local Norwegian connection, you are losing conversions. CoolVDS leverages direct peering at NIX to ensure packets stay within the country whenever possible.
Conclusion
Performance isn't magic; it's physics and configuration. By raising file descriptors, tuning kernel TCP parameters, and utilizing persistent connections, you can double the throughput of your API gateway without spending an extra Krone on hardware.
However, software tuning hits a wall if the underlying virtualization is noisy or the disk is slow. Don't let IO wait kill your SEO.
Ready to see the difference NVMe makes? Deploy a high-performance instance on CoolVDS in under 55 seconds and benchmark it yourself.