API Gateway Performance Tuning: Surviving the Microservices Storm
Let’s be honest: Microservices are great for developer velocity, but they are a nightmare for operations. You break a monolith into fifty services, and suddenly your frontend has to make fifty round-trips to render a single dashboard. You throw an API Gateway in front—maybe Nginx, Kong, or the newly released HAProxy 2.0—and breathe a sigh of relief. Until Black Friday hits.
Suddenly, that gateway isn't a traffic cop; it's a bottleneck. I’ve spent too many nights debugging 502 Bad Gateways on overloaded clusters where the logs just scream "Connection Refused." The culprit is rarely the application code. It's usually a default Linux kernel config from 2016 or a virtual machine that's being suffocated by noisy neighbors.
If you are serving customers in Norway or the broader Nordic region, latency isn't just a number; it's a competitive disadvantage. Here is how we tune API Gateways for raw throughput and stability, using the stack available to us in 2019.
1. The Hardware Lie: Why CPU Steal is Killing Your API
Before touching a single config file, look at your infrastructure. API Gateways are CPU and I/O intensive. They handle SSL termination, request routing, and often logging to disk. In a shared hosting environment, you are fighting for CPU cycles.
Run this command on your current gateway during peak load:
top - 14:31:02 up 10 days, 3:14, 1 user, load average: 2.15, 2.05, 1.98
%Cpu(s): 15.2 us, 4.1 sy, 0.0 ni, 50.5 id, 0.1 wa, 0.0 hi, 0.2 si, 29.9 stSee that 29.9 st at the end? That is Steal Time. It means the hypervisor is stealing CPU cycles from your VM to serve another client on the same physical host. Your Nginx worker processes are literally frozen, waiting for the processor to wake up. For an API Gateway requiring sub-millisecond processing, this is fatal.
This is why we built CoolVDS on strict KVM virtualization with resource guarantees. We don't overcommit CPU cores on our high-performance nodes. When you deploy an API Gateway, you need to know that a CPU core is actually yours. If you see high steal time, no amount of software tuning will save you. Move to a provider that respects isolation.
2. Kernel Tuning: Opening the Floodgates
Most Linux distros (Ubuntu 18.04, CentOS 7) ship with conservative defaults intended for desktop usage or light web serving. They are not tuned for 50,000 concurrent connections. You need to modify /etc/sysctl.conf.
Here is the battle-tested configuration I use for high-throughput gateways:
# /etc/sysctl.conf
# Increase system-wide file descriptor limit
fs.file-max = 2097152
# Increase the backlog for incoming connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
# Reuse sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1
# Increase available ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535
# TCP Window Scaling (Critical for high bandwidth)
net.ipv4.tcp_window_scaling = 1
# Protect against SYN flood attacks while allowing legitimate spikes
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_syncookies = 1Pro Tip: After saving this file, runsysctl -pto apply changes without a reboot. But be careful—tcp_tw_recyclewas removed in Linux 4.12+ kernels because it breaks NAT. Stick totcp_tw_reuse.
3. Nginx Configuration: The Engine Room
Whether you are using raw Nginx or a derivative like Kong (which is just Lua on OpenResty), the underlying mechanics are identical. The most common mistake I see is neglecting Keepalives to the upstream services.
By default, Nginx acts as a reverse proxy that opens a new connection to your backend microservice for every single request. This adds the overhead of a TCP handshake (and potentially SSL handshake) to every API call. It effectively DDoS-es your own internal network.
Fix it by defining an upstream block with keepalive:
http {
upstream backend_service {
server 10.0.0.5:8080;
server 10.0.0.6:8080;
# Keep 100 idle connections open to the backend
keepalive 100;
}
server {
listen 443 ssl;
server_name api.coolvds-client.no;
location / {
proxy_pass http://backend_service;
# HTTP 1.1 is required for keepalive
proxy_http_version 1.1;
# Clear the Connection header to persist the link
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
}This simple change can reduce internal latency by 30-50ms per request. When your servers are located in our Oslo data center, utilizing the local loop, the response feels instantaneous.
4. Data Residency & The Norwegian Context
Performance isn't just about speed; it's about legality. With the GDPR in full force for over a year now, and the Norwegian Datatilsynet being notoriously strict, where your logs sit matters.
Many US-based cloud providers route traffic through Frankfurt or London. While legally acceptable under current frameworks, it adds network latency. A packet going from a user in Bergen to a server in Frankfurt and back takes roughly 30-40ms. A packet from Bergen to Oslo takes 8ms.
| Route | Latency (Avg) | Jurisdiction |
|---|---|---|
| Oslo -> AWS Frankfurt | ~35ms | Germany/USA |
| Oslo -> DigitalOcean Amsterdam | ~28ms | Netherlands/USA |
| Oslo -> CoolVDS Oslo | ~2ms | Norway |
Hosting your API Gateway domestically on CoolVDS utilizes the Norwegian Internet Exchange (NIX). It keeps data strictly within Norwegian borders, simplifying your compliance posture regarding log retention and PII data processing. It’s cleaner, safer, and significantly faster for your local user base.
5. Disk I/O: The Silent Killer of Throughput
If you have robust logging enabled (access logs, error logs, audit trails), your disk writes can block your request processing. This is especially true if you are using standard spinning rust (HDD) or low-tier SSDs with shared IOPS.
We recently migrated a client running a Magento API backend. They were capping out at 200 requests per second. The bottleneck? Access logs. The disk couldn't write fast enough, causing Nginx workers to block.
We moved them to a CoolVDS instance with local NVMe storage. We didn't change a line of code. Throughput jumped to 1,200 requests per second. If you are logging API traffic, NVMe isn't a luxury in 2019; it's a requirement.
Final Check: File Descriptors
Before you deploy, verify your user limits. Nginx runs as the `www-data` or `nginx` user. If that user is capped at 1024 open files, your kernel tuning is useless.
# Check limits for the running process
cat /proc/$(pgrep nginx | head -n 1)/limits | grep "Max open files"
# Expected Output:
# Max open files 65535 65535 filesIf you see 1024, edit /etc/security/limits.conf immediately.
Conclusion
Optimizing an API Gateway is an exercise in removing constraints. You remove the network constraint with keepalives. You remove the kernel constraint with `sysctl`. You remove the I/O constraint with NVMe.
But you cannot remove the constraint of bad hardware. You need a foundation that supports the load you are planning for. Don't let your infrastructure be the reason your microservices fail.
Ready to test real performance? Deploy a CoolVDS NVMe instance in Oslo today. Spinning up a test environment takes 55 seconds.