Scaling NGINX & Kong: API Gateway Tuning for the Post-GDPR Era

Let's be honest: default configurations are for hobbyists. If you are running an API Gateway—whether it's raw NGINX, Kong, or Tyk—on a standard Linux install in 2018, you are leaving 40% of your performance on the table. I recently audited a fintech setup in Oslo where the developers were blaming their Java microservices for latency spikes. They were ready to rewrite the entire stack.

The culprit? It wasn't Java. It was a default sysctl.conf file and a noisy neighbor on a cheap VPS provider causing CPU steal time to hit 15%.

In a post-GDPR world (we're two months in, and the dust still hasn't settled), latency isn't just an annoyance; it's a compliance risk when data needs to stay within Norwegian or EEA borders. If you are routing traffic through Frankfurt just to save a few kroner, you are doing it wrong. Here is how we tune the kernel and the gateway to handle thousands of requests per second without melting the server.

1. The OS Layer: Open File Descriptors

Most Linux distributions, including Ubuntu 18.04 LTS and CentOS 7, ship with conservative limits. When your API Gateway acts as a reverse proxy, every incoming connection and every upstream connection consumes a file descriptor. The default limit of 1024 is a joke for production.

Check your current limits:

ulimit -n

If it says 1024, you are capping your concurrency. Here is how to fix it permanently in /etc/security/limits.conf:

* soft nofile 65535
* hard nofile 65535
root soft nofile 65535
root hard nofile 65535

Then, ensure NGINX knows about this capability. In your main nginx.conf, place this at the top level, outside the http block:

worker_rlimit_nofile 65535;

Pro Tip: Don't just set this blindly. Verify your provider allows these limits. On CoolVDS KVM instances, we expose the full kernel capabilities to the guest OS, unlike some OpenVZ providers that share kernel limits across containers.

2. TCP Stack Tuning: The Ephemeral Port Exhaustion

A high-throughput API gateway creates and destroys TCP connections rapidly. If you see high TIME_WAIT states in netstat, you are running out of ephemeral ports. The kernel holds these closed connections for too long (60 seconds by default) before reusing the port.

Edit /etc/sysctl.conf to modernize your TCP stack for 2018 standards:

# Allow reuse of sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65000

# Increase the max number of backlog connections
net.core.somaxconn = 4096
net.core.netdev_max_backlog = 4096

# Protect against SYN flood attacks (basic mitigation)
net.ipv4.tcp_syncookies = 1

Apply these changes immediately:

sysctl -p

3. NGINX & Kong Optimization

Whether you use raw NGINX or Kong (which sits on top of OpenResty/NGINX), the upstream keepalive logic is critical. By default, NGINX acts as a polite HTTP/1.0 client to your backend services, closing the connection after every request. This adds the overhead of a full TCP handshake (and SSL handshake if internal) to every single API call.

Enable Upstream Keepalive

You must define an upstream block and explicitly activate keepalive connections.

upstream backend_microservice {
    server 10.0.0.5:8080;
    # Keep 64 idle connections open to the backend
    keepalive 64;
}

server {
    location /api/ {
        proxy_pass http://backend_microservice;
        # Required for keepalive to work
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}

Without the Connection "" header clearing, NGINX will forward the "Close" header, defeating the purpose.

Worker Processes & CPU Affinity

In 2018, seeing servers with 32 or 64 cores is becoming common. However, context switching can kill performance. Set worker_processes auto; usually, but if you are squeezing every millisecond, bind workers to cores.

On a 4-core CoolVDS NVMe instance, auto works perfectly because our KVM allocation ensures those 4 vCPUs are actually available to you, not time-sliced to death by 20 other tenants.

4. SSL/TLS: Performance vs. Security

With Chrome marking HTTP sites as "Not Secure" starting this July (Chrome 68), SSL is mandatory. But SSL handshakes are expensive. Offload this cost.

Directive	Recommended Setting (2018)	Impact
`ssl_session_cache`	`shared:SSL:10m`	Reduces handshake CPU usage by caching parameters.
`ssl_session_timeout`	`10m`	Allows clients to reconnect faster within 10 mins.
`ssl_protocols`	`TLSv1.2`	Disable TLS 1.0/1.1 immediately (PCI DSS requirement).

Here is the config block to drop into your `http` context:

ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
ssl_protocols TLSv1.2;
ssl_prefer_server_ciphers on;
ssl_ciphers 'ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384';

The Hardware Reality: Why Config Only Goes So Far

You can have the most optimized sysctl.conf in Norway, but if your disk I/O is choking on standard SATA SSDs (or heaven forbid, HDDs), your API Gateway logs will buffer, blocking the worker threads.

API Gateways are logging-heavy. Every request generates an access log and an error log entry. On a high-traffic site, this is a constant stream of writes.

This is where the infrastructure choice dictates the ceiling of your performance:

Legacy VPS: Shared SATA SSDs. IOPS fluctuate based on other users. Latency spikes are unpredictable.
CoolVDS Architecture: We utilize local NVMe storage passed through via KVM. The I/O latency is virtually nonexistent compared to network-attached block storage used by the "big clouds."

Local Latency & GDPR

Latency is physics. If your users are in Oslo, Bergen, or Trondheim, serving them from a datacenter in Ireland or Frankfurt adds 20-40ms to the round trip. For an API Gateway making 3-4 internal calls, that delay stacks up.

Furthermore, Datatilsynet (The Norwegian Data Protection Authority) is taking a hard look at data sovereignty following the GDPR rollout in May. Hosting within Norway isn't just a performance optimization anymore; for many sectors (health, finance), it's becoming a legal safeguard.

Next Steps

Don't just take my word for it. Run wrk or ab against your current setup. If you aren't hitting the numbers you expect, check your kernel logs.

If the kernel is fine but the latency persists, it's time to move to dedicated resources. Deploy a CoolVDS NVMe instance in Oslo today—spin up takes about 55 seconds—and compare the Time-To-First-Byte. Performance is the only metric that doesn't lie.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

Scaling NGINX & Kong: API Gateway Tuning for the Post-GDPR Era

Scaling NGINX & Kong: API Gateway Tuning for the Post-GDPR Era

1. The OS Layer: Open File Descriptors

2. TCP Stack Tuning: The Ephemeral Port Exhaustion

3. NGINX & Kong Optimization

Enable Upstream Keepalive

Worker Processes & CPU Affinity

4. SSL/TLS: Performance vs. Security

The Hardware Reality: Why Config Only Goes So Far

Local Latency & GDPR

Next Steps

/// RELATED POSTS

Edge Computing in Norway: Architecting for Sub-5ms Latency in 2025

Kubernetes Networking Deep Dive: Optimizing Packet Flow for Low Latency in 2025

Surviving the Packet Storm: A Deep Dive into Kubernetes Networking & CNI Performance in 2025

Surviving the Millisecond War: Edge Computing Architectures for the Nordic Market

Kubernetes Networking Deep Dive: Why Your Packets Are Dropping in the Overlay

Serverless Without the Handcuffs: Implementing Private FaaS Patterns on High-Performance VDS in 2025