Crushing Latency: API Gateway Tuning for High-Load Nordic Infrastructure
It is November 2018. Black Friday is approaching. Your development team just broke the monolith into microservices, and suddenly, every user action triggers fifteen internal HTTP calls. Your latency just spiked from 80ms to 400ms. The culprit? It’s almost always the API Gateway.
Most system administrators slap a default NGINX reverse proxy in front of their stack and call it a day. That works for a hobby blog. It does not work when you are handling payment processing for a high-traffic Norwegian retailer complying with GDPR. If your gateway stutters, the whole chain collapses.
I have spent the last week debugging a bottleneck for a client in Oslo. Their infrastructure was fine, but their gateway configuration was stuck in 2014. Here is exactly how we fixed it, using technologies available right now—NGINX 1.15, Linux 4.x kernels, and proper hardware selection.
1. The OS Layer: Stop Starving Your Sockets
Before you even touch the application config, you need to look at the kernel. By default, most Linux distributions are tuned for general-purpose computing, not for handling 50,000 concurrent connections. When an API gateway sits between your users (say, on Telenor's mobile network) and your backend services, it burns through file descriptors and ephemeral ports fast.
Open your /etc/sysctl.conf. If it is empty, you are leaving performance on the table.
# /etc/sysctl.conf - Optimized for API Gateways (Nov 2018)
# Increase system-wide file descriptors
fs.file-max = 2097152
# Widen the port range for outgoing connections to upstreams
net.ipv4.ip_local_port_range = 1024 65535
# Enable TCP Reuse to recycle sockets in TIME_WAIT state
net.ipv4.tcp_tw_reuse = 1
# Increase the backlog for high-traffic bursts
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 16384
net.ipv4.tcp_max_syn_backlog = 8192
# Fast Open can help, but ensure client support
net.ipv4.tcp_fastopen = 3
Apply these with sysctl -p. The tcp_tw_reuse flag is critical here. Without it, your gateway will run out of sockets while waiting for old connections to close, resulting in those dreaded 502 Bad Gateway errors during traffic spikes.
2. NGINX: The Worker Configuration Myth
Many DevOps engineers set worker_processes auto; and assume it is optimal. It usually is, but not if your virtualization layer is lying to you. In a shared hosting environment (OpenVZ or overloaded Xen), the "CPU cores" you see are often overcommitted threads fighting for time slices.
This is where the choice of infrastructure becomes architectural, not just financial. We use KVM at CoolVDS because it provides stricter isolation. When NGINX asks for a CPU core, it needs it now, not after the neighbor's crypto-mining script finishes.
However, assuming you have dedicated cores, you must bind workers to CPUs to prevent cache thrashing (context switching). Here is the robust config:
# nginx.conf
user www-data;
worker_processes auto;
# Enforce CPU affinity to reduce context switching cost
worker_cpu_affinity auto;
# Increase open file limit per worker
worker_rlimit_nofile 65535;
events {
# epoll is mandatory for Linux performance
use epoll;
# Allow worker to accept multiple connections at once
multi_accept on;
# Max connections per worker
worker_connections 16384;
}
Pro Tip: If you are running SSL termination on the gateway (which you should be, for GDPR compliance and centralized certificate management), the CPU cost increases significantly. Ensure your host supports the AES-NI instruction set. On CoolVDS NVMe instances, this is passed through by default.
3. The Upstream Keepalive Trap
This is the number one mistake I see in 2018. By default, NGINX uses HTTP/1.0 for upstream connections and closes the connection after every request. If your gateway talks to a microservice backend, you are performing a full TCP handshake (SYN, SYN-ACK, ACK) for every single API call.
That adds milliseconds of latency. In a mesh of 10 calls, that adds up to a sluggish user experience. You must enable keepalive to the upstream.
http {
upstream backend_api {
server 10.0.0.5:8080;
server 10.0.0.6:8080;
# Keep 64 idle connections open to the backend
keepalive 64;
}
server {
location /api/ {
proxy_pass http://backend_api;
# REQUIRED for keepalive to work
proxy_http_version 1.1;
proxy_set_header Connection "";
# Buffering tweaks for high throughput
proxy_buffers 16 16k;
proxy_buffer_size 32k;
}
}
}
Setting proxy_set_header Connection ""; clears the "close" header that NGINX adds by default. This single change can double your throughput.
4. SSL/TLS: Performance vs. Security
With the enforcement of GDPR in May this year, encryption is non-negotiable. But encryption is heavy. We are seeing a move toward OpenSSL 1.1.1 and TLS 1.3, which reduces the handshake overhead. If your OS (like Ubuntu 18.04 LTS) supports it, enable TLS 1.3 immediately.
However, latency is also physical. If your users are in Oslo and your server is in a cheap data center in Ohio, the speed of light is your enemy. A round trip across the Atlantic takes ~100ms. A round trip from Oslo to a local data center takes <5ms.
Comparison: Latency Impact on SSL Handshake
| Server Location | User Location | Ping (RTT) | Time to First Byte (HTTPS) |
|---|---|---|---|
| New York, USA | Oslo, NO | ~95ms | ~380ms (4x RTT) |
| Frankfurt, DE | Oslo, NO | ~25ms | ~100ms |
| Oslo, NO (CoolVDS) | Oslo, NO | ~2ms | ~10ms |
Local hosting isn't just about data sovereignty; it is a raw performance metric. For Norwegian businesses, hosting outside the region is a performance tax.
5. Disk I/O: The Silent Killer of Logging
API Gateways generate massive logs. Access logs, error logs, audit logs for compliance. If you write these to a standard spinning HDD (or a network-attached storage solution with poor IOPS), the disk write blocks the worker process.
In high-concurrency scenarios, this is fatal. You have two options:
- Disable access logging (rarely an option for production/audit).
- Buffer the logs in memory before writing.
# Buffer logs: Write to disk only when buffer (64k) is full or every 10 seconds
access_log /var/log/nginx/access.log combined buffer=64k flush=10s;
But even with buffering, eventually, the data must hit the disk. This is why we standardized on NVMe storage for all CoolVDS instances. NVMe queues are designed for parallelism, unlike the single-queue limit of SATA SSDs. When your gateway is hammered by a DDoS or a marketing campaign, NVMe keeps the I/O wait time at practically zero.
6. Rate Limiting with Lua (OpenResty)
Finally, to protect your backends, you need rate limiting. Standard NGINX limiting is good, but for complex logic (e.g., "allow 100 req/min for free users, 1000 for paid"), you likely use OpenResty (NGINX + LuaJIT). It is the engine behind Kong, which has gained massive traction this year.
Here is a simple, high-performance rate limiter snippet using the resty.limit.req library, which uses the leaky bucket algorithm efficiently:
-- In your access_by_lua_block
local limit_req = require "resty.limit.req"
-- Limit: 200 requests/sec, burst 100
local lim, err = limit_req.new("my_limit_store", 200, 100)
if not lim then
ngx.log(ngx.ERR, "failed to instantiate a resty.limit.req object: ", err)
return ngx.exit(500)
end
local key = ngx.var.binary_remote_addr
local delay, err = lim:incoming(key, true)
if not delay then
if err == "rejected" then
return ngx.exit(503)
end
ngx.log(ngx.ERR, "failed to limit req: ", err)
return ngx.exit(500)
end
if delay >= 0.001 then
-- ideally handle delay by sleeping, but careful with blocking threads
ngx.sleep(delay)
end
Conclusion
Performance tuning is an art of subtraction. You remove the latency of the handshake, you remove the blocking I/O of the logs, and you remove the jitter of noisy neighbors. By November 2018 standards, a single well-tuned NGINX instance on a 4-core VM should easily handle 20,000 requests per second if the upstream is fast enough.
Don't let legacy configs or slow spinning disks kill your application's responsiveness. Check your sysctl.conf, enable upstream keepalives, and ensure your hardware isn't the bottleneck.
Ready to test your tuned gateway? Deploy a high-frequency NVMe instance on CoolVDS in Oslo today and see what single-digit latency looks like.