Console Login

Tuning Nginx as an API Gateway: The 2017 Guide to Sub-Millisecond Latency

Stop Letting Default Configs Kill Your Throughput

If you are running an API gateway on a VPS with stock sysctl.conf and default Nginx settings in 2017, you are essentially driving a Ferrari in first gear. I recently audited a setup for a client in Oslo—a fintech startup trying to push real-time transaction data. They were hosting on a generic European cloud provider, blaming their Java backend for latency issues. The backend was fine. The bottleneck was the gateway.

When your users are sitting in Stavanger or Trondheim, and your handshake takes 40ms because of poor routing or overloaded file descriptors, you aren't just losing time; you're losing trust. In the Nordic market, where fiber penetration is high and user expectations are higher, infrastructure transparency is critical.

Let's fix your stack. We are going to look at the Linux kernel limits, Nginx upstream configurations, and why hardware choice (specifically NVMe) is no longer optional.

1. The OS Is Your First Bottleneck

Before Nginx even accepts a packet, the Linux kernel has to allow it. On a standard CentOS 7 or Ubuntu 16.04 install, the defaults are conservative. They are designed for general-purpose computing, not high-throughput edge routing.

I see this error in logs constantly:

socket: too many open files (24)

This happens because the default open file limit is often set to 1024. For an API gateway handling thousands of concurrent connections, this is laughable. You need to increase the file descriptor limits for the system and the Nginx user.

The Fix:

Edit /etc/security/limits.conf and add:

* soft nofile 65535
* hard nofile 65535
root soft nofile 65535
root hard nofile 65535

But that is just user limits. You also need to tune the kernel's network stack. The ephemeral port range is often too small, and the backlog gets full during traffic spikes (burstiness).

Here is the /etc/sysctl.conf configuration I deploy on every CoolVDS instance intended for gateway usage:

# Maximize the number of open file descriptors
fs.file-max = 2097152

# Increase the ephemeral port range to allow more connections
net.ipv4.ip_local_port_range = 1024 65535

# Allow reuse of sockets in TIME_WAIT state for new connections
# Critical for API gateways making frequent backend calls
net.ipv4.tcp_tw_reuse = 1

# Increase the maximum number of connections in the backlog
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535

# Optimize for low latency over throughput (disable Nagle's algorithm)
net.ipv4.tcp_low_latency = 1

Apply this with sysctl -p. If you skip tcp_tw_reuse, your gateway will run out of sockets talking to your upstream API because they will all be stuck in TIME_WAIT.

2. Nginx Upstream Keepalive: The Hidden Killer

Most people configure Nginx as a reverse proxy like this:

location /api/ {
    proxy_pass http://backend;
}

This is functionally correct but performance suicide. By default, Nginx uses HTTP/1.0 for upstream connections and closes the connection after every request. This means for every single API call, your gateway has to open a TCP connection to your backend, perform a handshake, send data, and close it. This adds significant overhead, specifically CPU load on the backend.

You must enable HTTP/1.1 and keepalive connections to the upstream.

The Optimized Config:

upstream backend_cluster {
    server 10.0.0.5:8080;
    server 10.0.0.6:8080;
    
    # Keep 32 idle connections open to the backend per worker
    keepalive 32;
}

server {
    listen 80;
    listen 443 ssl http2; # HTTP/2 is mandatory in 2017

    location /api/ {
        proxy_pass http://backend_cluster;
        
        # Required for keepalive to work
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        
        # Pass real IP to backend (vital for logs)
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}
Pro Tip: If your backend is SSL-terminated (which it should be for GDPR compliance prep), the `keepalive` directive saves you the expensive SSL handshake on the internal network. This reduces backend CPU usage by up to 40%.

3. Hardware Matters: The IOPS Trap

You can tune software all day, but if your disk I/O is choking on access logs or cache writes, latency spikes will occur. In 2017, many hosting providers still push SATA SSDs as "high performance." They aren't. Not for heavy API workloads.

When an API gateway buffers a large request body or writes access logs to disk, you enter the I/O wait queue. On SATA, this queue fills up fast.

Storage TypeRandom Read IOPSWrite Latency
Standard SATA SSD~5,000 - 10,0000.2ms - 1.0ms
CoolVDS NVMe~300,000+< 0.05ms

We built the CoolVDS platform strictly on NVMe because we saw too many "noisy neighbor" issues on shared SATA arrays. When a neighbor on a shared host starts a backup, your API latency shouldn't jump from 10ms to 200ms. NVMe provides the queue depth necessary to absorb those hits.

4. Local Context: Norway and Data Sovereignty

With the EU General Data Protection Regulation (GDPR) looming on the horizon for 2018, data placement is becoming a legal discussion, not just a technical one. Datatilsynet (The Norwegian Data Protection Authority) is becoming increasingly strict about where data is processed.

Running your API Gateway in a Norwegian data center reduces the legal complexity of cross-border data transfers. Furthermore, peering matters. CoolVDS instances peer directly at NIX (Norwegian Internet Exchange). If your customers are on Telenor or Altibox, their traffic hits your gateway in Oslo without detouring through Sweden or Germany.

5. SSL Optimization

Finally, terminate SSL efficiently. In April 2017, we still see servers vulnerable to POODLE or configured with weak ciphers. Use this snippet to ensure security without sacrificing speed (assuming Nginx 1.10+):

ssl_protocols TLSv1.2;
ssl_ciphers 'ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256';
ssl_prefer_server_ciphers on;

# Enable OCSP Stapling to speed up handshakes
ssl_stapling on;
ssl_stapling_verify on;
resolver 8.8.8.8 8.8.4.4 valid=300s;
resolver_timeout 5s;

# Session Cache
ssl_session_cache shared:SSL:50m;
ssl_session_timeout 1d;

This configuration prefers ChaCha20, which is faster on mobile devices that lack AES hardware acceleration.

Conclusion

Performance isn't magic; it is the sum of a thousand small configuration choices. By tuning your file descriptors, enabling upstream keepalives, and ensuring your underlying hardware uses NVMe storage, you can handle significantly higher loads on smaller infrastructure.

Don't let legacy hardware limit your code's potential. Spin up a CoolVDS NVMe instance today and benchmark the difference yourself. Your ab (Apache Benchmark) results will thank you.