Crushing API Latency: Gateway Tuning Strategies for High-Throughput Norwegian Workloads
If your API Gateway adds more than 20ms to a request, you have a problem. In a microservices architecture, that overhead compounds. A single user action triggering five internal service calls can easily result in a perception of sluggishness that drives users away. I have seen perfectly written Go and Rust services strangled by a poorly configured API Gateway sitting in front of them.
Most developers treat the Gateway (be it NGINX, Kong, or Traefik) as a black box. They `apt-get install`, enable a few routes, and hope for the best. That is negligence.
We are going to look at how to tune the Linux kernel and the Gateway application layer for maximum throughput. We will also address the elephant in the room: why software tuning is useless if your underlying VPS is stealing CPU cycles.
1. The Kernel is the Bottleneck
Out of the box, most Linux distributions are tuned for general-purpose usage, not for handling 50,000 concurrent connections. Before touching your Gateway config, you must address the networking stack.
High-concurrency environments often run out of ephemeral ports or file descriptors. Here is the baseline `sysctl.conf` configuration I use for high-traffic nodes deployed in Oslo to ensure we can handle bursts without dropping packets.
Key Kernel Parameters
# /etc/sysctl.conf
# Increase the maximum number of open file descriptors
fs.file-max = 2097152
# Maximize the backlog of incoming connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
# Widen the local port range to allow more outbound connections to upstreams
net.ipv4.ip_local_port_range = 1024 65535
# Enable TCP Fast Open (TFO) if your clients support it
net.ipv4.tcp_fastopen = 3
# Reduce the time sockets spend in FIN-WAIT-2
net.ipv4.tcp_fin_timeout = 15
# Reuse sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1
# BBR Congestion Control for better throughput over the public internet
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr
Apply these with `sysctl -p`. The `tcp_tw_reuse` setting is particularly critical for API Gateways acting as a reverse proxy, as they open thousands of connections to upstream services. Without this, you will hit a wall where the server cannot open new sockets because thousands are stuck waiting to close.
2. NGINX / OpenResty Configuration
Whether you are using raw NGINX, OpenResty, or Kong (which is built on OpenResty), the core directives remain similar. The default `nginx.conf` is conservative.
One specific project involving a fintech client in Bergen required us to handle massive webhook bursts. We found that the SSL handshake was the primary latency driver. Here is how we optimized the worker and SSL settings.
Worker and Connection Tuning
worker_processes auto;
worker_rlimit_nofile 65535;
events {
worker_connections 16384;
use epoll;
multi_accept on;
}
The `multi_accept on` directive tells a worker process to accept all new connections at once, rather than one at a time. This is aggressive but necessary for high throughput.
Upstream Keepalive
This is the most common mistake. By default, NGINX closes the connection to the upstream backend after every request. This forces a new TCP handshake (and potentially SSL handshake) for every single API call inside your network. It kills performance.
upstream backend_microservices {
server 10.0.0.5:8080;
server 10.0.0.6:8080;
# Keep 100 idle connections open per worker
keepalive 100;
}
server {
location /api/ {
proxy_pass http://backend_microservices;
# Required for keepalive to work
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}
Pro Tip: Always verify your keepalive settings. Use `tcpdump` on your gateway to ensure you aren't seeing a flood of SYN/ACK packets between the gateway and the upstream service. If you see constant handshakes, your keepalive is broken.
3. TLS Optimization for 2024
Decryption is expensive. While modern CPUs have AES-NI instructions, doing it wrongly still costs milliseconds. In 2024, there is no excuse for not using TLS 1.3.
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
ssl_prefer_server_ciphers on;
# OCSP Stapling allows the server to send the certificate status, saving the client a lookup
ssl_stapling on;
ssl_stapling_verify on;
resolver 1.1.1.1 8.8.8.8 valid=300s;
resolver_timeout 5s;
# Session resumption
ssl_session_cache shared:SSL:50m;
ssl_session_timeout 1d;
ssl_session_tickets off;
4. The Hardware Reality: Steal Time
You can tune your kernel until it's perfect, but if the hypervisor refuses to give you CPU cycles, your latency will spike. This is known as "CPU Steal Time."
In a shared hosting environment (typical budget VPS), neighbors can monopolize the physical CPU. When your API Gateway tries to process a request, it has to wait for the hypervisor to schedule it. For a batch job, this is fine. For a real-time API, it is fatal.
To check if you are a victim of this, run `top` and look at the `%st` value.
%Cpu(s): 5.4 us, 2.1 sy, 0.0 ni, 85.2 id, 0.1 wa, 0.0 hi, 0.2 si, 7.0 st
If `st` (steal time) is consistently above 1-2%, your provider is overselling their hardware. You cannot tune your way out of noisy neighbors.
This is why for mission-critical gateways, we built CoolVDS on KVM with strict resource guarantees. We utilize high-frequency CPUs and enterprise NVMe storage. When you buy 4 vCPUs on CoolVDS, those cycles are reserved for you. This consistency is why our p99 latency metrics remain flat even during peak traffic hours.
5. Local Nuances: Norway and GDPR
Operating in Norway adds a layer of legal and network complexity. Under GDPR and the ruling of Schrems II, data sovereignty is paramount. If your API Gateway logs payloads containing PII (Personally Identifiable Information) and ships those logs to a US-based cloud monitoring service, you are likely non-compliant.
Host locally. By placing your Gateway on CoolVDS infrastructure in Oslo, you reduce the round-trip time (RTT) to local users significantly. The average latency from Oslo to a server in Frankfurt is ~25ms. From Oslo to Oslo (via NIX), it is <2ms. For a financial application or a high-speed e-commerce store serving the Norwegian market, that difference is noticeable in the UI.
Comparison: Latency Impact
| Metric | Hosted in Frankfurt (Big Cloud) | Hosted in Oslo (CoolVDS) |
|---|---|---|
| Network RTT | ~25-30ms | ~2ms |
| TLS Handshake (3 round trips) | ~80ms | ~6ms |
| Data Sovereignty | Complex (US Cloud Act) | Clear (Norwegian Law) |
Final Thoughts
Performance is a stack. It starts with hardware, moves to the kernel, and ends with application configuration. Ignoring any layer results in mediocrity.
If you are serious about API performance, stop fighting with oversold shared environments. Deploy a test instance on CoolVDS today, run your benchmark suite, and watch the `st` metric stay at 0.0. Your users (and your CTO) will thank you.