Console Login

API Gateway Performance Tuning: 99th Percentile Latency is the Only Metric That Matters

API Gateway Performance Tuning: 99th Percentile Latency is the Only Metric That Matters

If you are looking at average response times, you are deluding yourself. Averages hide the spikes that frustrate users and kill conversion rates. In the world of high-frequency trading or real-time bidding, a 50ms variance is an eternity. Even for a standard e-commerce API serving the Nordic market, consistent latency is the difference between a sale and a bounce.

Most DevOps engineers deploy an API Gateway—be it Kong, Nginx, or Envoy—using default Helm charts or `apt-get install` configurations. This is negligence. Default settings are designed for compatibility on low-resource hardware, not for handling 10,000 requests per second (RPS) on a production grade node.

In this guide, we are going to rip apart the Linux network stack and Nginx configurations to squeeze every millisecond of performance out of your infrastructure. We will assume you are running on Ubuntu 22.04 LTS with at least 4 vCPUs.

1. The OS Layer: Tuning the Linux Kernel

Before your request even hits the application, it has to traverse the Linux kernel's TCP stack. By default, Linux is polite. It waits for connections to close gracefully. It limits file descriptors to prevent memory exhaustion. For a high-performance gateway, we need Linux to be aggressive.

We need to modify /etc/sysctl.conf. The defaults for somaxconn (socket listen backlog) are often as low as 128 or 4096. If you get a burst of traffic, packets drop before Nginx even sees them. We also need to enable tcp_tw_reuse to recycle sockets in the TIME_WAIT state immediately.

# /etc/sysctl.conf optimizations for high throughput

# Increase the maximum number of open file descriptors
fs.file-max = 2097152

# Increase the read/write buffer sizes for TCP
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

# Increase the backlog of incoming connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 16384

# Reuse sockets in TIME_WAIT state for new connections
# (Safe for use in 2022 kernels like 5.15+)
net.ipv4.tcp_tw_reuse = 1

# Increase the port range for outgoing connections
net.ipv4.ip_local_port_range = 1024 65535

After applying these, run sysctl -p. Do not forget to increase the user-level file descriptor limits in /etc/security/limits.conf, or your gateway will crash with "Too many open files" during a DDoS attack or a marketing push.

2. The Gateway Layer: Nginx & Kong 3.0

Whether you are using raw Nginx 1.23 or the recently released Kong Gateway 3.0 (which brought huge performance gains in September), the underlying principles remain the same. The most common bottleneck I see in audits is the lack of upstream keepalives.

By default, Nginx opens a new connection to your upstream service (your microservice, database, or backend) for every single request. This involves a full TCP handshake and SSL/TLS negotiation. That is expensive CPU time. You must configure Nginx to keep these connections open.

upstream backend_api {
    server 10.0.0.5:8080;
    
    # The critical setting: Keep 64 idle connections open to the upstream
    keepalive 64;
}

server {
    location /api/ {
        proxy_pass http://backend_api;
        
        # Required to enforce HTTP/1.1 for keepalive
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}
Pro Tip: If you are using SSL termination at the gateway (which you should), ensure you are using OpenSSL 3.0.7 or newer to mitigate the recent vulnerabilities, but more importantly, configure ssl_session_cache shared:SSL:10m. This allows workers to share SSL session parameters, reducing the handshake overhead drastically.

3. The Hardware Reality: NVMe or Nothing

You can tune software all day, but you cannot tune away physics. If your API gateway logs requests to disk (access logs, error logs, or audit trails), your disk I/O becomes a blocking operation. On standard SATA SSDs, high concurrency causes I/O Wait to spike, stealing CPU cycles from your worker processes.

In 2022, there is no excuse for running databases or high-load gateways on anything less than NVMe Gen4 storage. The latency difference is an order of magnitude. A standard SSD might give you 500MB/s read, but NVMe drives can push 7000MB/s with microsecond latency.

This is where CoolVDS differentiates itself. Unlike budget VPS providers that throttle your IOPS or put you on noisy shared storage, CoolVDS instances provide NVMe performance that feels like bare metal. When your gateway is writing 5,000 log lines per second, that hardware difference prevents the "micro-stutters" that ruin your p99 metrics.

4. The Elephant in the Server Room: GDPR & Schrems II

Performance is not just about speed; it is about availability and compliance. Since the Schrems II ruling invalidated the Privacy Shield, transferring European user data to US-owned clouds (AWS, Google, Azure) has become a legal minefield. The Norwegian Data Protection Authority (Datatilsynet) has been increasingly vocal about this, and the "noyb" complaints are not going away.

If you host your API Gateway in Frankfurt on a US-owned provider, you are technically exposing IP addresses and user metadata to US surveillance jurisdictions. By hosting on CoolVDS in Oslo, you solve two problems:

  • Latency: Your Norwegian users get <5ms ping times, compared to 25ms+ round-trip to Germany.
  • Sovereignty: Your data stays in Norway, under Norwegian law, simplifying your GDPR compliance posture significantly.

Benchmarking the Difference

Do not take my word for it. Install wrk (a modern HTTP benchmarking tool) and test your current endpoint versus a tuned CoolVDS instance.

# Run a 30-second test with 12 threads and 400 open connections
wrk -t12 -c400 -d30s https://your-api-endpoint.com/v1/status

Look at the Latency Distribution section. If your 99% is over 200ms, your current architecture is failing you. With the kernel tunings above and the raw NVMe power of CoolVDS, we regularly see p99 latencies under 40ms for complex API calls.

Speed is a feature. Compliance is a requirement. Don't compromise on either.

Ready to drop your latency? Deploy a CoolVDS NVMe instance in Oslo today and see the difference real hardware makes.