Scaling NGINX & Kong: API Gateway Tuning for the Post-GDPR Era
Let's be honest: default configurations are for hobbyists. If you are running an API Gateway—whether it's raw NGINX, Kong, or Tyk—on a standard Linux install in 2018, you are leaving 40% of your performance on the table. I recently audited a fintech setup in Oslo where the developers were blaming their Java microservices for latency spikes. They were ready to rewrite the entire stack.
The culprit? It wasn't Java. It was a default sysctl.conf file and a noisy neighbor on a cheap VPS provider causing CPU steal time to hit 15%.
In a post-GDPR world (we're two months in, and the dust still hasn't settled), latency isn't just an annoyance; it's a compliance risk when data needs to stay within Norwegian or EEA borders. If you are routing traffic through Frankfurt just to save a few kroner, you are doing it wrong. Here is how we tune the kernel and the gateway to handle thousands of requests per second without melting the server.
1. The OS Layer: Open File Descriptors
Most Linux distributions, including Ubuntu 18.04 LTS and CentOS 7, ship with conservative limits. When your API Gateway acts as a reverse proxy, every incoming connection and every upstream connection consumes a file descriptor. The default limit of 1024 is a joke for production.
Check your current limits:
ulimit -n
If it says 1024, you are capping your concurrency. Here is how to fix it permanently in /etc/security/limits.conf:
* soft nofile 65535
* hard nofile 65535
root soft nofile 65535
root hard nofile 65535
Then, ensure NGINX knows about this capability. In your main nginx.conf, place this at the top level, outside the http block:
worker_rlimit_nofile 65535;
Pro Tip: Don't just set this blindly. Verify your provider allows these limits. On CoolVDS KVM instances, we expose the full kernel capabilities to the guest OS, unlike some OpenVZ providers that share kernel limits across containers.
2. TCP Stack Tuning: The Ephemeral Port Exhaustion
A high-throughput API gateway creates and destroys TCP connections rapidly. If you see high TIME_WAIT states in netstat, you are running out of ephemeral ports. The kernel holds these closed connections for too long (60 seconds by default) before reusing the port.
Edit /etc/sysctl.conf to modernize your TCP stack for 2018 standards:
# Allow reuse of sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1
# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65000
# Increase the max number of backlog connections
net.core.somaxconn = 4096
net.core.netdev_max_backlog = 4096
# Protect against SYN flood attacks (basic mitigation)
net.ipv4.tcp_syncookies = 1
Apply these changes immediately:
sysctl -p
3. NGINX & Kong Optimization
Whether you use raw NGINX or Kong (which sits on top of OpenResty/NGINX), the upstream keepalive logic is critical. By default, NGINX acts as a polite HTTP/1.0 client to your backend services, closing the connection after every request. This adds the overhead of a full TCP handshake (and SSL handshake if internal) to every single API call.
Enable Upstream Keepalive
You must define an upstream block and explicitly activate keepalive connections.
upstream backend_microservice {
server 10.0.0.5:8080;
# Keep 64 idle connections open to the backend
keepalive 64;
}
server {
location /api/ {
proxy_pass http://backend_microservice;
# Required for keepalive to work
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}
Without the Connection "" header clearing, NGINX will forward the "Close" header, defeating the purpose.
Worker Processes & CPU Affinity
In 2018, seeing servers with 32 or 64 cores is becoming common. However, context switching can kill performance. Set worker_processes auto; usually, but if you are squeezing every millisecond, bind workers to cores.
On a 4-core CoolVDS NVMe instance, auto works perfectly because our KVM allocation ensures those 4 vCPUs are actually available to you, not time-sliced to death by 20 other tenants.
4. SSL/TLS: Performance vs. Security
With Chrome marking HTTP sites as "Not Secure" starting this July (Chrome 68), SSL is mandatory. But SSL handshakes are expensive. Offload this cost.
| Directive | Recommended Setting (2018) | Impact |
|---|---|---|
ssl_session_cache |
shared:SSL:10m |
Reduces handshake CPU usage by caching parameters. |
ssl_session_timeout |
10m |
Allows clients to reconnect faster within 10 mins. |
ssl_protocols |
TLSv1.2 |
Disable TLS 1.0/1.1 immediately (PCI DSS requirement). |
Here is the config block to drop into your `http` context:
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
ssl_protocols TLSv1.2;
ssl_prefer_server_ciphers on;
ssl_ciphers 'ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384';
The Hardware Reality: Why Config Only Goes So Far
You can have the most optimized sysctl.conf in Norway, but if your disk I/O is choking on standard SATA SSDs (or heaven forbid, HDDs), your API Gateway logs will buffer, blocking the worker threads.
API Gateways are logging-heavy. Every request generates an access log and an error log entry. On a high-traffic site, this is a constant stream of writes.
This is where the infrastructure choice dictates the ceiling of your performance:
- Legacy VPS: Shared SATA SSDs. IOPS fluctuate based on other users. Latency spikes are unpredictable.
- CoolVDS Architecture: We utilize local NVMe storage passed through via KVM. The I/O latency is virtually nonexistent compared to network-attached block storage used by the "big clouds."
Local Latency & GDPR
Latency is physics. If your users are in Oslo, Bergen, or Trondheim, serving them from a datacenter in Ireland or Frankfurt adds 20-40ms to the round trip. For an API Gateway making 3-4 internal calls, that delay stacks up.
Furthermore, Datatilsynet (The Norwegian Data Protection Authority) is taking a hard look at data sovereignty following the GDPR rollout in May. Hosting within Norway isn't just a performance optimization anymore; for many sectors (health, finance), it's becoming a legal safeguard.
Next Steps
Don't just take my word for it. Run wrk or ab against your current setup. If you aren't hitting the numbers you expect, check your kernel logs.
If the kernel is fine but the latency persists, it's time to move to dedicated resources. Deploy a CoolVDS NVMe instance in Oslo today—spin up takes about 55 seconds—and compare the Time-To-First-Byte. Performance is the only metric that doesn't lie.