99th Percentile: Tuning API Gateways for Low-Latency Norwegian Traffic
If you are happy with a 200ms TTFB (Time To First Byte), stop reading. This article isn't for you. Go back to your shared hosting and default apt-get installs.
But if you are seeing 502 errors during traffic spikes, or if your "microservices" architecture is adding 50ms of overhead per hop, we need to talk. I recently audited a setup for a client in Oslo who couldn't understand why their shiny new Kubernetes cluster was slower than their old monolith. The culprit wasn't the code; it was a choked API Gateway relying on default Linux TCP settings and sluggish disk I/O.
Latency is cumulative. In a distributed system, your Gateway is the front door. If the door is jammed, it doesn't matter how fast the house is. Here is how we tune the stack for raw speed, strictly adhering to what is stable and production-ready in early 2020.
1. The Kernel: Breaking the Connection Limit
Most Linux distributions, including the standard images you get on generic cloud providers, ship with conservative defaults intended for desktop usage or light web serving. When you act as an API Gateway, you are essentially a packet forwarder. You need to handle thousands of ephemeral connections.
First, check your file descriptor limits. In Linux, everything is a file, including a TCP connection.
ulimit -n
If that returns 1024, your gateway will capsize under load. You need to raise this at the OS level. But beyond file descriptors, the real killer is the TCP backlog.
When a connection request comes in (SYN), it goes into a queue. If the application (Nginx/Kong/HAProxy) doesn't accept it fast enough, or if the queue is full, the kernel drops the packet. The client waits, times out, and retries. Latency spikes.
Here is the /etc/sysctl.conf configuration we deploy on high-performance CoolVDS instances to handle connection floods:
# Maximize the number of file handles
fs.file-max = 2097152
# Increase the size of the receive queue
net.core.netdev_max_backlog = 16384
# Increase the maximum number of connections in the wait state
net.core.somaxconn = 65535
# Increase the ephemeral port range to allow more outgoing connections to upstreams
net.ipv4.ip_local_port_range = 1024 65535
# Reuse sockets in TIME_WAIT state for new connections (critical for API gateways)
net.ipv4.tcp_tw_reuse = 1
# Disable slow start after idle to prevent latency on keepalive connections
net.ipv4.tcp_slow_start_after_idle = 0
# BBR Congestion Control (Kernel 4.9+ required)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr
Apply these with sysctl -p. The tcp_tw_reuse flag is particularly vital. Without it, your gateway will run out of source ports when talking to backend services, leading to the dreaded "Cannot assign requested address" error.
Pro Tip: Do not enablenet.ipv4.tcp_tw_recycle. It was removed in newer kernels and breaks connections for clients behind NATs (which is almost everyone on mobile 4G networks). Stick toreuse.
2. Nginx Tuning: The Upstream Keepalive Trap
Most API Gateways today are Nginx under the hood (Kong, OpenResty, ingress-nginx). The most common misconfiguration I see is the lack of keepalives to upstream services.
By default, Nginx speaks HTTP/1.0 to backends and closes the connection after every request. This means for every single API call, your gateway performs a full TCP handshake (SYN, SYN-ACK, ACK) with the microservice. This eats CPU and adds latency.
You verify this behavior by looking at your backend logs. If you see a new port for every request, you are wasting cycles.
Configure your upstream block to keep connections open:
upstream backend_service {
server 10.0.0.5:8080;
server 10.0.0.6:8080;
# Keep 100 idle connections open per worker
keepalive 100;
}
server {
location /api/ {
proxy_pass http://backend_service;
# Required for keepalive to work
proxy_http_version 1.1;
proxy_set_header Connection "";
# Buffer settings tuned for JSON payloads
proxy_buffers 16 16k;
proxy_buffer_size 16k;
}
}
The proxy_set_header Connection ""; directive is mandatory. Without it, Nginx forwards the "Connection: close" header from the client to the backend, defeating the purpose.
3. SSL/TLS: The CPU Tax
Termination is expensive. If you are terminating TLS (which you should be, for security), you are burning CPU cycles on RSA/ECC math. In 2020, we have hardware support for this (AES-NI), but configuration still matters.
Ensure you are using OpenSSL 1.1.1 or higher to take advantage of TLS 1.3. It reduces the handshake overhead significantly (1-RTT). Check your version:
openssl version
If you are still on 1.0.2, upgrade immediately. Additionally, enable OCSP Stapling to save your users a DNS lookup and a connection to the Certificate Authority:
ssl_stapling on;
ssl_stapling_verify on;
4. The Hardware Factor: Why NVMe and KVM Matter
You can tune software all day, but if your I/O Wait (iowait) is high, your API will lag. API Gateways generate massive logs. Access logs, error logs, and often local caching (like OpenResty shared dictionaries) hit the disk hard.
We built CoolVDS on NVMe storage specifically for this reason. Standard SATA SSDs top out around 550 MB/s. NVMe drives can push 3,500 MB/s. When you are logging 10,000 requests per second, that difference is the gap between a smooth service and a locked-up server.
Furthermore, virtualization type matters. We use KVM (Kernel-based Virtual Machine). Unlike OpenVZ or LXC, KVM provides hard resource isolation. In a containerized VPS environment, a "noisy neighbor" can steal your CPU cycles when they decide to mine crypto or compile a kernel. On CoolVDS KVM instances, your CPU time is yours.
5. Rate Limiting with Redis
An API Gateway must protect the backends. Local memory rate limiting is fast but doesn't work well if you have multiple gateway instances behind a load balancer. You need a centralized store.
Redis is the standard here. However, network round-trips to Redis can add latency. Use Redis Pipelining or Lua scripts to minimize trips. Here is a basic Lua script logic for a sliding window rate limiter you might run inside Nginx/OpenResty:
local limit = 100
local window = 60
local key = "rate:" .. ngx.var.remote_addr
-- Simple atomic increment
local current = redis:incr(key)
if current == 1 then
redis:expire(key, window)
end
if current > limit then
return ngx.exit(429)
end
Running this logic on a local Redis instance on the same LAN (or same rack) is crucial. In our Norwegian datacenter, internal latency between instances is practically zero, making this architecture viable.
6. Local Context: The Norwegian Advantage
Physics is the ultimate bottleneck. If your users are in Oslo, Bergen, or Trondheim, hosting your API Gateway in Frankfurt or London adds 20-40ms of round-trip time (RTT) purely due to distance. For an API that requires multiple sequential calls, this latency compounds.
By hosting on CoolVDS in Norway, you are peering directly at NIX (Norwegian Internet Exchange). Your RTT to local ISPs (Telenor, Telia) drops to single digits. Furthermore, with the current focus on data sovereignty and the strict enforcement by Datatilsynet, keeping your log data (which contains IP addresses—PII under GDPR) within Norwegian borders simplifies your compliance posture significantly.
Final Thoughts
Performance isn't magic. It's the sum of a thousand small optimizations. It's setting tcp_nodelay on;, it's choosing the right congestion control algorithm, and it's ensuring your underlying infrastructure isn't stealing your resources.
Don't let a default config file be the reason your app feels slow. Deploy a test instance, apply these sysctl settings, and benchmark it.
Ready to drop your latency? Spin up a high-performance NVMe KVM instance on CoolVDS today and experience the difference raw speed makes.