API Gateway Performance Tuning: Squeezing Microseconds out of NGINX and Envoy in 2025
It is 3:00 AM on a Tuesday. Your monitoring dashboardâprobably Prometheus backed by Grafanaâis lighting up with red alerts. The 99th percentile latency just spiked from 45ms to 1.2s, and your users in Oslo are seeing 502 Bad Gateway errors. You check the backend services; they are idling. You check the database; CPU is at 10%. The bottleneck, as it so often happens in modern microservices architectures, is the front door: your API Gateway. Most VPS providers will tell you to just "scale up" or add more nodes, but throwing hardware at a misconfigured Linux kernel is like trying to fix a flat tire by changing the engine. Efficiency is not just about saving money; it is about engineering discipline. In the high-stakes environment of Nordic fintech and e-commerce, where customers demand instant interactions and Datatilsynet demands strict compliance, you cannot afford a gateway that chokes on TCP handshakes.
The Kernel: Where Performance Actually Lives
Before you even touch your nginx.conf or Envoy YAML, you must address the underlying operating system. By default, most Linux distributions (even in 2025) ship with conservative network settings designed for general-purpose compatibility, not high-throughput packet switching. When your API gateway hits 10,000 concurrent connections, the default file descriptor limits and TCP backlog queues become immediate hard stops. I recall a specific incident involving a payment processor in Bergen; their load balancer was dropping SYN packets simply because the kernel didn't know what to do with them fast enough. The fix wasn't upgrading to a larger instance type; it was tuning the `sysctl.conf` to handle the connection churn. You need to expand the ephemeral port range and enable TCP Fast Open if your clients support it. Furthermore, the congestion control algorithm matters. While BBR (Bottleneck Bandwidth and RTT) has been the gold standard since the late 2010s, ensuring it is actually active and configured for your specific bandwidth-delay product is often overlooked. If you are running on CoolVDS, our images come with optimized kernels, but you should still verify these settings yourself.
Pro Tip: Never blindly copy-paste sysctl settings from a StackOverflow post from 2018. Validate every flag against your current kernel version (likely 6.8+ in 2025). Using deprecated flags can silently fail or cause erratic network behavior.
Essential Kernel Tuning
Apply these settings to /etc/sysctl.conf to prepare your gateway for high concurrency. These adjustments increase the backlog queue, enable BBR, and allow for faster recycling of TIME_WAIT sockets, although strictly speaking, `tcp_tw_reuse` is the safer modern alternative to the deprecated `tcp_tw_recycle`.
# Increase the maximum number of open file descriptors
fs.file-max = 2097152
# Maximize the backlog of incoming connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
# TCP Buffer tuning for 10G/40G networks
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
# Enable BBR Congestion Control
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr
# Allow reuse of sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1
# Increase ephemeral port range
net.ipv4.ip_local_port_range = 1024 65535
To apply these changes without a reboot, run:
sysctl -p
The Silent Killer: Upstream Keepalives
If there is one configuration error I see repeated across almost every botched deployment, it is the lack of HTTP keepalives between the Gateway and the Upstream services. By default, NGINX acts as a polite HTTP/1.0 client to upstreams: it opens a connection, sends the request, receives the response, and closes the connection. In a microservices mesh where your gateway talks to an Auth service, then an Inventory service, and then a Pricing service, you are performing a full TCP handshake (and potentially a TLS handshake) for every single internal request. This adds massive overhead and latency. The CPU cycles wasted on SSL termination alone can cripple a standard virtual machine. On a platform like CoolVDS, where you have dedicated NVMe and KVM isolation, the hardware helps, but software inefficiency will still eat your margins. You must explicitly configure the upstream block to keep connections open.
Correct NGINX Upstream Configuration
Here is how you properly configure an upstream block to maintain persistent connections to your backend application.
upstream backend_api {
server 10.0.0.5:8080;
server 10.0.0.6:8080;
# The number of idle keepalive connections to an upstream server
# that remain open for each worker process.
keepalive 64;
}
server {
location /api/ {
proxy_pass http://backend_api;
# Required for HTTP/1.1
proxy_http_version 1.1;
# Clear the Connection header to enable Keep-Alive
proxy_set_header Connection "";
# Pass essential headers
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
The Hardware Reality: NVMe and CPU Steal
Software tuning can only take you so far. Eventually, physics gets in the way. In the context of API Gateways, two hardware metrics matter: I/O Wait and CPU Steal. If your gateway is logging extensively (access logs, error logs, audit trails for GDPR compliance), slow disk I/O will block the worker threads. I have seen sophisticated setups crumble because the hosting provider was using standard SSDs over a saturated SATA bus. In 2025, there is absolutely no excuse for not using NVMe storage for high-performance workloads. NVMe queues are designed for parallelism, matching the multi-core nature of modern CPUs.
Furthermore, "CPU Steal" is the silent performance killer in shared hosting environments. If your neighbor on the physical host decides to mine crypto or compile the Linux kernel, your API gateway latency fluctuates wildly. This is why serious architects choose KVM virtualization over containers or oversold VPS solutions. KVM provides a higher degree of isolation. When we architected the CoolVDS platform, we specifically chose to limit density to ensure that a "dedicated vCPU" actually behaves like one. For a gateway handling sensitive Norwegian user dataâwhich must stay within the EEAâpredictability is synonymous with compliance. You cannot guarantee SLA adherence if you cannot guarantee CPU cycles.
Traefik: The Modern Alternative
While NGINX remains the undisputed king of raw throughput, Traefik has gained massive traction for its dynamic configuration capabilities, especially in Kubernetes environments. However, out of the box, Traefik v3 can be resource-intensive if not tamed. Specifically, the buffer settings and transport configurations need to be aligned with your infrastructure. If you are deploying Traefik on a CoolVDS instance to manage Docker containers, you should look at defining transport limitations to prevent connection exhaustion.
# static configuration (traefik.yaml)
entryPoints:
web:
address: ":80"
websecure:
address: ":443"
http3: {}
serversTransport:
maxIdleConnsPerHost: 100
forwardingTimeouts:
dialTimeout: "30s"
responseHeaderTimeout: "30s"
api:
dashboard: true
insecure: false
Local Latency and Compliance
Finally, we must address the geography. Light travels at a finite speed. Round trip time (RTT) from Oslo to a data center in Virginia (US) is roughly 90-100ms. From Oslo to Frankfurt, it is roughly 20-30ms. From a user in Oslo to a server in Oslo (like those provided by CoolVDS), it is often sub-3ms. In the world of high-frequency trading or real-time bidding, that difference is the entire business model. Furthermore, storing data outside of the EEA creates a legal minefield regarding the GDPR and the Schrems II ruling, which remains a critical consideration in 2025. By hosting your API Gateway and persistence layers physically within Norway or Northern Europe, you simplify your legal posture significantly.
Performance tuning is an iterative process. Start with the kernel, secure your hardware foundation with NVMe-backed instances, and ensure your application layer talks efficiently to your backends. Do not let default configurations dictate your uptime.
Ready to test your tuned configuration? Deploy a high-frequency NVMe KVM instance on CoolVDS today and see the difference raw power makes to your 99th percentile latency.