API Gateway Performance Tuning: Surviving the Microservices Storm

Let’s be honest: Microservices are great for developer velocity, but they are a nightmare for operations. You break a monolith into fifty services, and suddenly your frontend has to make fifty round-trips to render a single dashboard. You throw an API Gateway in front—maybe Nginx, Kong, or the newly released HAProxy 2.0—and breathe a sigh of relief. Until Black Friday hits.

Suddenly, that gateway isn't a traffic cop; it's a bottleneck. I’ve spent too many nights debugging 502 Bad Gateways on overloaded clusters where the logs just scream "Connection Refused." The culprit is rarely the application code. It's usually a default Linux kernel config from 2016 or a virtual machine that's being suffocated by noisy neighbors.

If you are serving customers in Norway or the broader Nordic region, latency isn't just a number; it's a competitive disadvantage. Here is how we tune API Gateways for raw throughput and stability, using the stack available to us in 2019.

1. The Hardware Lie: Why CPU Steal is Killing Your API

Before touching a single config file, look at your infrastructure. API Gateways are CPU and I/O intensive. They handle SSL termination, request routing, and often logging to disk. In a shared hosting environment, you are fighting for CPU cycles.

Run this command on your current gateway during peak load:

top - 14:31:02 up 10 days,  3:14,  1 user,  load average: 2.15, 2.05, 1.98
%Cpu(s): 15.2 us,  4.1 sy,  0.0 ni, 50.5 id,  0.1 wa,  0.0 hi,  0.2 si, 29.9 st

See that 29.9 st at the end? That is Steal Time. It means the hypervisor is stealing CPU cycles from your VM to serve another client on the same physical host. Your Nginx worker processes are literally frozen, waiting for the processor to wake up. For an API Gateway requiring sub-millisecond processing, this is fatal.

This is why we built CoolVDS on strict KVM virtualization with resource guarantees. We don't overcommit CPU cores on our high-performance nodes. When you deploy an API Gateway, you need to know that a CPU core is actually yours. If you see high steal time, no amount of software tuning will save you. Move to a provider that respects isolation.

2. Kernel Tuning: Opening the Floodgates

Most Linux distros (Ubuntu 18.04, CentOS 7) ship with conservative defaults intended for desktop usage or light web serving. They are not tuned for 50,000 concurrent connections. You need to modify /etc/sysctl.conf.

Here is the battle-tested configuration I use for high-throughput gateways:

# /etc/sysctl.conf

# Increase system-wide file descriptor limit
fs.file-max = 2097152

# Increase the backlog for incoming connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535

# Reuse sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Increase available ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535

# TCP Window Scaling (Critical for high bandwidth)
net.ipv4.tcp_window_scaling = 1

# Protect against SYN flood attacks while allowing legitimate spikes
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_syncookies = 1

Pro Tip: After saving this file, run sysctl -p to apply changes without a reboot. But be careful—tcp_tw_recycle was removed in Linux 4.12+ kernels because it breaks NAT. Stick to tcp_tw_reuse.

3. Nginx Configuration: The Engine Room

Whether you are using raw Nginx or a derivative like Kong (which is just Lua on OpenResty), the underlying mechanics are identical. The most common mistake I see is neglecting Keepalives to the upstream services.

By default, Nginx acts as a reverse proxy that opens a new connection to your backend microservice for every single request. This adds the overhead of a TCP handshake (and potentially SSL handshake) to every API call. It effectively DDoS-es your own internal network.

Fix it by defining an upstream block with keepalive:

http {
    upstream backend_service {
        server 10.0.0.5:8080;
        server 10.0.0.6:8080;
        
        # Keep 100 idle connections open to the backend
        keepalive 100;
    }

    server {
        listen 443 ssl;
        server_name api.coolvds-client.no;

        location / {
            proxy_pass http://backend_service;
            
            # HTTP 1.1 is required for keepalive
            proxy_http_version 1.1;
            
            # Clear the Connection header to persist the link
            proxy_set_header Connection "";
            
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
        }
    }
}

This simple change can reduce internal latency by 30-50ms per request. When your servers are located in our Oslo data center, utilizing the local loop, the response feels instantaneous.

4. Data Residency & The Norwegian Context

Performance isn't just about speed; it's about legality. With the GDPR in full force for over a year now, and the Norwegian Datatilsynet being notoriously strict, where your logs sit matters.

Many US-based cloud providers route traffic through Frankfurt or London. While legally acceptable under current frameworks, it adds network latency. A packet going from a user in Bergen to a server in Frankfurt and back takes roughly 30-40ms. A packet from Bergen to Oslo takes 8ms.

Route	Latency (Avg)	Jurisdiction
Oslo -> AWS Frankfurt	~35ms	Germany/USA
Oslo -> DigitalOcean Amsterdam	~28ms	Netherlands/USA
Oslo -> CoolVDS Oslo	~2ms	Norway

Hosting your API Gateway domestically on CoolVDS utilizes the Norwegian Internet Exchange (NIX). It keeps data strictly within Norwegian borders, simplifying your compliance posture regarding log retention and PII data processing. It’s cleaner, safer, and significantly faster for your local user base.

5. Disk I/O: The Silent Killer of Throughput

If you have robust logging enabled (access logs, error logs, audit trails), your disk writes can block your request processing. This is especially true if you are using standard spinning rust (HDD) or low-tier SSDs with shared IOPS.

We recently migrated a client running a Magento API backend. They were capping out at 200 requests per second. The bottleneck? Access logs. The disk couldn't write fast enough, causing Nginx workers to block.

We moved them to a CoolVDS instance with local NVMe storage. We didn't change a line of code. Throughput jumped to 1,200 requests per second. If you are logging API traffic, NVMe isn't a luxury in 2019; it's a requirement.

Final Check: File Descriptors

Before you deploy, verify your user limits. Nginx runs as the `www-data` or `nginx` user. If that user is capped at 1024 open files, your kernel tuning is useless.

# Check limits for the running process
cat /proc/$(pgrep nginx | head -n 1)/limits | grep "Max open files"

# Expected Output:
# Max open files            65535                65535                files

If you see 1024, edit /etc/security/limits.conf immediately.

Conclusion

Optimizing an API Gateway is an exercise in removing constraints. You remove the network constraint with keepalives. You remove the kernel constraint with `sysctl`. You remove the I/O constraint with NVMe.

But you cannot remove the constraint of bad hardware. You need a foundation that supports the load you are planning for. Don't let your infrastructure be the reason your microservices fail.

Ready to test real performance? Deploy a CoolVDS NVMe instance in Oslo today. Spinning up a test environment takes 55 seconds.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

API Gateway Performance Tuning: Surviving the Microservices Storm without Latency Spikes

API Gateway Performance Tuning: Surviving the Microservices Storm

1. The Hardware Lie: Why CPU Steal is Killing Your API

2. Kernel Tuning: Opening the Floodgates

3. Nginx Configuration: The Engine Room

4. Data Residency & The Norwegian Context

5. Disk I/O: The Silent Killer of Throughput

Final Check: File Descriptors

Conclusion

/// RELATED POSTS

API Gateway Tuning: Crushing Latency in High-Traffic Nordic Systems

Silence the Noise: Advanced APM Strategies for High-Throughput Norwegian Systems

Bun vs. Node.js in 2025: Why High-Performance Runtimes Die on Cheap VPS Hardware

Zero-Compromise API Gateway Tuning: Reducing Latency from Oslo to the Edge

Nordic Latency Killers: Advanced API Gateway Tuning for High-Throughput Systems

Zen 5 in the Datacenter: Why We Deployed AMD Ryzen 9000 Series for High-Performance VDS