API Gateway Performance Tuning: Squeezing Milliseconds out of Nginx and Kong

Everyone is rushing to microservices this year. It’s the architectural trend of 2016. But nobody talks about the hangover: Latency. When you smash a monolithic application into twenty microservices, you introduce twenty network hops where there used to be simple function calls. If your API Gateway—the traffic cop of your infrastructure—adds even 30ms of overhead per request, your application feels broken.

I’ve spent the last month debugging a high-traffic e-commerce platform migrating from Magento 1 to a service-oriented architecture. Their "scalable" cloud setup was choking on 2,000 requests per second. The culprit wasn't the code; it was a default Nginx configuration and a hypervisor stealing CPU cycles during SSL termination.

Here is how you fix the API bottleneck, tuned specifically for the realities of 2016 infrastructure.

1. The TCP Handshake Tax

The most expensive operation in your gateway isn't routing the packet; it's establishing the connection. If your gateway opens a new connection to the upstream backend for every single incoming request, you are burning CPU on TCP handshakes and ephemeral port exhaustion.

In Nginx, the default behavior is to close the connection to the backend immediately. You need to enable keepalive connections to your upstreams. This allows Nginx to reuse existing TCP connections, bypassing the handshake entirely for subsequent requests.

upstream backend_api {
    server 10.0.0.4:8080;
    server 10.0.0.5:8080;

    # The magic number. Keep 64 idle connections open per worker.
    keepalive 64;
}

server {
    location /api/ {
        proxy_pass http://backend_api;
        
        # Required for HTTP 1.1 keepalive to upstreams
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}

Pro Tip: Do not set the keepalive value arbitrarily high. It should roughly match the number of concurrent users divided by the number of backend nodes, otherwise, you risk hoarding connections that your backend services might need to close to free up memory.

2. SSL Termination: The CPU Killer

With Google pushing HTTPS as a ranking signal and Let's Encrypt finally leaving beta earlier this year, encryption is non-negotiable. However, the RSA handshake is heavy. If you are terminating SSL at the gateway (which you should), your CPU usage will spike linearly with traffic.

First, ensure you are using Session Resumption. This allows clients who have already visited to skip the heavy handshake.

ssl_session_cache shared:SSL:20m;
ssl_session_timeout 180m;

Second, if you are running on modern hardware (like the Haswell or Broadwell architectures available on CoolVDS NVMe plans), ensure your OpenSSL library can utilize AES-NI instructions. But more importantly, move to Elliptic Curve Cryptography (ECC). ECDSA certificates are significantly faster to sign and verify than 2048-bit RSA keys.

3. Linux Kernel Tuning for High Concurrency

Your operating system defaults are designed for general-purpose computing, not for handling 10,000 concurrent connections. On a CentOS 7 or Ubuntu 16.04 server, you will hit the `SOMAXCONN` limit immediately under load.

I recently audited a server in Oslo where the logs were full of "syn flood" warnings. It wasn't a DDoS; it was just legitimate traffic hitting a kernel configured for 2005. Add these lines to your /etc/sysctl.conf to widen the pipe:

# Increase the maximum number of backlog connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 5000

# Allow reuse of sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535

# Protect against SYN flood attacks while allowing legitimate spikes
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 4096

After applying these, run sysctl -p. The difference in throughput stability is night and day.

4. The "Noisy Neighbor" Problem in Virtualization

You can tune Nginx until your fingers bleed, but if the underlying hypervisor is stealing your CPU cycles, it doesn't matter. In standard public clouds, your "2 vCPU" instance is often sharing a physical core with three other clients. When a neighbor spins up a heavy Jenkins build, your API Gateway latency jitters.

For an API Gateway, Steal Time (st) is the enemy. Check it with `top`. If `%st` is consistently above 0.5, you need to move.

Metric	Budget VPS (OpenVZ)	CoolVDS (KVM + NVMe)
Virtualization	Shared Kernel (No syscall isolation)	Hardware Virtualization (Dedicated Kernel)
Disk I/O	SATA/SAS (Rotational)	NVMe (Direct PCI-E access)
Latency Consistency	High Jitter	Flat / Predictable

We built CoolVDS on KVM specifically to solve this. We don't oversubscribe CPU cores aggressively because we know that for a gateway, processing the request queue requires instant CPU availability. Furthermore, with the looming GDPR requirements and the mess of the Privacy Shield agreement this summer, hosting your gateway physically in Norway (or the EEA) is becoming a compliance necessity, not just a performance one.

5. Benchmarking: Don't Guess, Measure

Stop using ab (Apache Bench). It’s single-threaded and becomes the bottleneck before your server does. In 2016, the standard is wrk.

Here is how I stress test a gateway configuration to ensure it handles the traffic spikes we see during Black Friday:

# Run for 30 seconds, using 12 threads, keeping 400 open connections
wrk -t12 -c400 -d30s https://api.yourdomain.no/v1/status

If you see a high standard deviation in the latency results, your current hosting environment likely has I/O wait issues or CPU contention.

Final Thoughts

An API Gateway is the front door to your digital business. If the door is stuck, it doesn't matter how nice the furniture inside is. By enabling upstream keepalives, tuning your kernel TCP stack, and running on infrastructure that guarantees dedicated resources, you can drop your overhead from 50ms to 5ms.

Don't let latency kill your conversion rates. Deploy a high-performance KVM instance on CoolVDS today and test your API response times against the big clouds. You might be surprised what real hardware isolation does for your throughput.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

API Gateway Performance Tuning: Squeezing Milliseconds out of Nginx and Kong in 2016

API Gateway Performance Tuning: Squeezing Milliseconds out of Nginx and Kong

1. The TCP Handshake Tax

2. SSL Termination: The CPU Killer

3. Linux Kernel Tuning for High Concurrency

4. The "Noisy Neighbor" Problem in Virtualization

5. Benchmarking: Don't Guess, Measure

Final Thoughts

/// RELATED POSTS

Edge Computing in Norway: Architecting for Sub-5ms Latency in 2025

Kubernetes Networking Deep Dive: Optimizing Packet Flow for Low Latency in 2025

Surviving the Packet Storm: A Deep Dive into Kubernetes Networking & CNI Performance in 2025

Surviving the Millisecond War: Edge Computing Architectures for the Nordic Market

Kubernetes Networking Deep Dive: Why Your Packets Are Dropping in the Overlay

Serverless Without the Handcuffs: Implementing Private FaaS Patterns on High-Performance VDS in 2025