Crushing API Latency: Kernel Tuning & Gateway Patterns for High-Throughput Systems
If your API Gateway adds more than 15ms of overhead to a request, your architecture is bleeding money. I don't care how pretty your microservices diagram looks; if the front door is jammed, the house is useless.
In the Nordic hosting market, we have a specific advantage: incredibly stable power and decent connectivity via NIX (Norwegian Internet Exchange). Yet, I consistently see developers deploying heavy Java-based gateways on oversold container instances, wondering why their p99 latency spikes to 400ms every time a neighbor runs a cron job.
By late 2025, the standard for "fast" has shifted. Users expect near-instant interactions. We are going to fix your setup. We will bypass default Linux constraints, tune the TCP stack for aggressive throughput, and align your infrastructure with the physical reality of the hardware.
The Lie of "Infinite Scalability"
Cloud providers love to sell you the dream of horizontal autoscaling. "Just add more pods," they say. This is inefficient logic. Before you scale out, you must scale up efficiency.
Pro Tip: Context switching kills throughput. A single, well-tuned CoolVDS instance with dedicated CPU cores will often outperform a cluster of three small, noisy-neighbor containers. Cores matter. Steal time is the enemy.
The War Story: Black Friday 2024
Last year, a frantic CTO from a Bergen-based e-commerce platform called me. Their Kubernetes ingress was dropping 5% of packets during load testing. They blamed the code. I blamed the kernel.
We logged into their edge nodes. The application wasn't crashing; the OS was rejecting connections because the backlog queue was full. They were hitting the default somaxconn limit hard. A five-minute fix in sysctl.conf saved their launch.
Layer 1: The OS & Kernel Tuning
Most Linux distros, even the robust Ubuntu 24.04 LTS we use at CoolVDS, ship with conservative defaults intended for desktop or general-purpose use. For an API Gateway handling 10k+ RPS (Requests Per Second), these defaults are suffocating.
First, check your current backlog limit:
sysctl net.core.somaxconn
If it returns 4096 or less, you are bottlenecking under burst loads. Here is the baseline configuration we apply to high-performance CoolVDS instances intended for edge routing:
# /etc/sysctl.conf optimization for API Gateways
# Increase the maximum number of connections in the backlog
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
# Reuse sockets in TIME_WAIT state for new connections
# Note: tcp_tw_recycle is deprecated/removed in modern kernels (6.x), use reuse only.
net.ipv4.tcp_tw_reuse = 1
# Increase ephemeral port range to prevent exhaustion
net.ipv4.ip_local_port_range = 1024 65535
# BBR Congestion Control (Standard in 2025 for wan-facing latencies)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr
# Minimize swap usage. API Gateways should live in RAM.
vm.swappiness = 1
Apply these changes with:
sudo sysctl -p
This ensures your OS accepts connections as fast as the network card delivers them. Speaking of network cards, if you aren't using a provider with VirtIO drivers and 10GbE uplinks, these tweaks won't help. We standardize on VirtIO to minimize the hypervisor overhead.
Layer 2: The Gateway Configuration (NGINX Focus)
Whether you use raw NGINX, Kong, or OpenResty, the underlying engine is likely NGINX. A common mistake is failing to configure upstream keepalives.
By default, NGINX opens a new connection to your backend microservice for every single request. This involves a full TCP handshake (and potentially TLS handshake). This is expensive. You need to reuse these connections.
Here is the correct structure for an upstream block handling high traffic:
upstream backend_api {
# The backend service
server 10.0.0.5:8080;
# KEEPALIVE IS CRITICAL
# This keeps 64 idle connections open to the backend per worker
keepalive 64;
}
server {
listen 443 ssl http2;
# HTTP/3 (QUIC) support is mature in 2025, enable it if your clients support it
listen 443 quic reuseport;
location /api/ {
proxy_pass http://backend_api;
# Required for keepalive to work
proxy_http_version 1.1;
proxy_set_header Connection "";
# Buffer tuning
proxy_buffers 16 16k;
proxy_buffer_size 32k;
}
}
Without the proxy_set_header Connection ""; directive, NGINX forwards the "close" header to the backend, defeating the purpose of the keepalive.
SSL/TLS Termination
Decryption burns CPU. In 2025, if you aren't using TLS 1.3 exclusively (or allowing 1.2 only for legacy), you are wasting cycles on outdated cryptographic primitives.
Check your OpenSSL version matches the system capabilities:
openssl version
Ensure you are utilizing hardware offloading (AES-NI) which is standard on our Intel/AMD EPYC setups.
Layer 3: Storage IO and Rate Limiting
API Gateways often double as rate limiters. This requires state. If you store rate limit counters (e.g., Redis or an internal shared dict) on slow storage, you introduce "jitter."
If you are using a disk-backed persistence layer for your gateway logs or caching, HDD is a death sentence. Even standard SSDs can choke under heavy write pressure (like debug logging during a DDoS attack).
The NVMe Difference:
At CoolVDS, we stopped offering spinning rust for primary volumes years ago. NVMe provides the high IOPS required to write access logs asynchronously without blocking the worker threads. If your iowait goes above 1%, your gateway is lagging.
Test your disk write latency. If it's over 1ms, move host:
dd if=/dev/zero of=testfile bs=1G count=1 oflag=dsync
Compliance & Data Sovereignty (The Boring but Mandatory Part)
We operate in Norway. We deal with Datatilsynet. When you optimize your gateway, be careful with logging.
If you dump full request bodies to /var/log/nginx/access.log to debug performance, and that body contains a Norwegian personal ID number (fødselsnummer), you are violating GDPR. High-performance setups should stream logs to a centralized collector (like Loki or ELK) and strip PII before it hits the disk.
Benchmarking the Result
Don't guess. Measure. We use wrk or k6 for this. Here is a Lua script for wrk to test a POST endpoint with a payload, mimicking a real API transaction:
-- bench.lua
wrk.method = "POST"
wrk.body = '{"foo": "bar", "baz": 123}'
wrk.headers["Content-Type"] = "application/json"
-- Command to run:
-- wrk -t12 -c400 -d30s -s bench.lua https://your-coolvds-instance.no/api/v1/resource
Target Metrics:
On a CoolVDS 4 vCPU / 8GB RAM instance, properly tuned, you should handle 15,000 req/sec with sub-50ms latency on simple echoes.
Conclusion
Performance isn't magic. It's the removal of unnecessary barriers. By tuning the Linux kernel to handle massive concurrency, configuring your gateway to reuse connections, and ensuring your underlying storage infrastructure (NVMe) doesn't block, you can achieve massive throughput on modest hardware.
Don't let a default config file be the reason your users churn. Get a system that lets you touch the kernel.
Ready to test your tuned config? Deploy a root-access CoolVDS instance in Oslo in under 55 seconds.