Squeezing Every Millisecond: API Gateway Tuning for High-Throughput Norwegian Workloads
If you are running a default Nginx or HAProxy configuration in production, you aren't managing traffic; you are choking it. I have spent the last decade debugging 502 Bad Gateways and analyzing flame graphs, and the conclusion is always the same: software defaults are designed for compatibility, not performance.
In May 2023, the demands on API gateways are higher than ever. Microservices are chatty. Frontend frameworks demand instant JSON responses. If your gateway adds more than 10ms of overhead, you are failing your engineering team. This guide is for the DevOps engineer who is tired of hearing "the network is slow" and wants to fix the root cause—starting at the kernel level.
1. The Hardware Lie: Why Virtualization Matters
Before we touch a single config file, we need to address the elephant in the data center: CPU Steal. You can tune your TCP stack until it's perfect, but if your hypervisor is overselling CPU cycles, your latency will spike unpredictably.
This is common in cheap shared hosting environments where "2 vCPUs" actually means "2 vCPUs competing with 50 other noisy neighbors." When an API gateway handles thousands of requests per second, context switching overhead kills throughput.
Pro Tip: Runtopand look at the%st(steal time) metric. If it is consistently above 0.0, migrate immediately. This is why standardizing on CoolVDS makes sense for gateways; our KVM-based virtualization ensures that the CPU cycles you pay for are physically reserved for your kernel, not shared in a lottery pool.
2. Kernel Tuning: The Foundation
Linux is tuned for a general-purpose desktop by default, not a high-throughput packet shuffler. For an API gateway, we need to open the floodgates.
Increasing File Descriptors
Every incoming connection is a file. If you hit the limit, your users get dropped.
Check your current limits:
ulimit -n
If it says 1024, you are in trouble. Edit /etc/security/limits.conf to raise the ceiling:
root soft nofile 1000000
root hard nofile 1000000
* soft nofile 1000000
* hard nofile 1000000
Optimizing the TCP Stack
We need to allow the reuse of sockets in the TIME_WAIT state and increase the backlog for pending connections. Add this to your /etc/sysctl.conf:
# Allow reuse of sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1
# Increase the maximum number of connections in the backlog
net.core.somaxconn = 65535
# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535
# Protect against SYN flood attacks
net.ipv4.tcp_syncookies = 1
# Maximize the backlog of SYN packets
net.ipv4.tcp_max_syn_backlog = 4096
Apply these changes with:
sysctl -p
3. Nginx / OpenResty Configuration
Whether you are using raw Nginx, Kong, or a custom OpenResty build, the nginx.conf is where the magic happens. The default worker_processes 1; is a relic of the past.
Worker Settings
Map your workers to your CPU cores and unbind the connection limit.
worker_processes auto;
And inside the events block:
events {
worker_connections 10000;
multi_accept on;
use epoll;
}
Upstream Keepalive
This is the most common mistake I see. Nginx talks HTTP/1.1 to the client, but if you don't configure it correctly, it opens a brand new TCP connection to your backend microservice for every single request. The handshake overhead destroys performance.
You must configure an upstream block with keepalive:
upstream backend_api {
server 10.0.0.5:8080;
server 10.0.0.6:8080;
# Keep 64 idle connections open per worker
keepalive 64;
}
And then inside your location block, enforce HTTP/1.1 for the proxy:
proxy_http_version 1.1;
proxy_set_header Connection "";
4. SSL/TLS Offloading
Encryption is computationally expensive. However, modern processors have AES-NI instruction sets that handle this efficiently. Ensuring your gateway uses them is critical.
We also need to balance security with speed. In 2023, you should be prioritizing TLS 1.3.
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
ssl_prefer_server_ciphers off;
# Optimize SSL session caching to reduce handshake overhead
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
5. The Norwegian Context: Latency and Compliance
Why does server location matter for an API Gateway? Physics. The speed of light is finite. If your users are in Oslo or Bergen, and your gateway is hosted in Virginia, you are adding 80-100ms of latency before the request is even processed.
Furthermore, hosting locally aids in compliance with GDPR and local interpretation by Datatilsynet. Keeping data transit within the EEA (European Economic Area), and specifically within Norway where possible, simplifies your legal posture significantly.
CoolVDS offers low-latency infrastructure directly connected to major Nordic peering points. This means your handshake times drop from 40ms (from Central Europe) to sub-5ms within Norway.
6. Putting It All Together: The Optimized Config
Here is a reference nginx.conf snippet combining these principles. This configuration assumes you are running on a CoolVDS NVMe Instance where disk I/O for logs won't block the worker threads.
http {
# Basic Settings
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
# Buffer Tuning for API payloads
client_body_buffer_size 128k;
client_max_body_size 10M;
client_header_buffer_size 1k;
large_client_header_buffers 4 4k;
output_buffers 1 32k;
postpone_output 1460;
# Logging: Buffering logs reduces IOPS pressure
access_log /var/log/nginx/access.log combined buffer=32k flush=5s;
error_log /var/log/nginx/error.log warn;
# Gzip Compression
gzip on;
gzip_disable "msie6";
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
# API Gateway Server Block
server {
listen 443 ssl http2;
server_name api.example.no;
ssl_certificate /etc/letsencrypt/live/api.example.no/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/api.example.no/privkey.pem;
location / {
proxy_pass http://backend_api;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# Timeouts for failing fast
proxy_connect_timeout 5s;
proxy_send_timeout 10s;
proxy_read_timeout 10s;
}
}
}
7. Load Testing: Prove It
Don't assume your changes worked. Verify them. In 2023, Locust is the tool of choice for Python-based load testing. It allows you to define user behavior in code.
Here is a simple locustfile.py to hammer your new CoolVDS instance:
from locust import HttpUser, task, between
class ApiUser(HttpUser):
wait_time = between(1, 2)
@task
def get_inventory(self):
self.client.get("/api/v1/inventory", headers={"Authorization": "Bearer test-token"})
@task(3)
def search_products(self):
self.client.get("/api/v1/search?q=t-shirt")
Run this from a separate instance (to avoid skewing results) and watch the RPS (Requests Per Second) climb. On a properly tuned CoolVDS instance, you should see a flat latency curve even as concurrency increases.
Conclusion
Performance isn't an accident. It is the result of deliberate architectural choices: selecting the right virtualization (KVM), tuning the Linux kernel for high concurrency, and optimizing the application layer to reduce overhead.
Your API gateway is the front door to your business. Don't let a creaky hinge cost you customers. If you are ready to deploy on hardware that respects your engineering efforts, it is time to switch.
Next Step: Spin up a high-performance, NVMe-backed instance in our Norwegian datacenter. Deploy your test environment on CoolVDS in 55 seconds.