Console Login

Kubernetes Networking Deep Dive: CNI, IPVS, and Why Latency Kills Clusters

Kubernetes Networking Deep Dive: CNI, IPVS, and Why Latency Kills Clusters

Let’s be honest: Kubernetes networking is where most production clusters go to die. I’ve seen seasoned sysadmins weep over intermittent 502 errors that turned out to be a misconfigured overlay network throttling packet throughput. It’s not magic; it’s encapsulation, and encapsulation costs CPU cycles. If you are running K8s on a noisy, oversold VPS, you are essentially trying to run a Ferrari engine on a go-kart track.

With the recent Schrems II ruling effectively killing the Privacy Shield agreement between the EU and the US last month, the network isn't just a technical bottleneck anymore—it's a legal minefield. If your ingress traffic is routed through US-owned load balancers, you might be non-compliant. Today, we are going deep into the plumbing: CNI choices, IPVS, and optimizing the kernel for high-throughput container traffic.

The CNI Battlefield: Calico vs. Flannel

The Container Network Interface (CNI) is the first decision you make, and usually the one you regret six months later. In 2020, if you are running a production workload, the default kubenet networking won't cut it.

Flannel (The "Easy" Way)

Flannel is simple. It creates a VXLAN overlay. Every packet is encapsulated in a UDP packet. This adds overhead. On a standard VPS with poor single-thread performance, this encapsulation/decapsulation process (encap/decap) eats your CPU interrupt time. I've debugged clusters where the application was fine, but the node was at 100% CPU just processing VXLAN headers.

Calico (The "Right" Way)

Calico offers pure Layer 3 networking using BGP (Border Gateway Protocol). No encapsulation if your underlying network supports it. It routes packets like a router would. This is what we run when performance matters.

Pro Tip: If you are deploying on CoolVDS, our KVM infrastructure supports the raw packet processing required for Calico's IPIP mode to be highly efficient, though pure BGP peering is the gold standard if you control the metal.

Here is a snippet for configuring Calico's IP pool to disable IP-in-IP encapsulation if your network supports flat routing (resulting in lower latency):

apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  name: default-ipv4-ippool
spec:
  cidr: 192.168.0.0/16
  ipipMode: Never  # Disable IPIP for raw performance
  natOutgoing: true

Kube-Proxy: Stop Using Iptables

By default, Kubernetes uses iptables to handle service discovery. Iptables is an O(n) algorithm. It lists rules sequentially. If you have 5,000 services, every packet has to traverse a massive list of rules to find its destination. I've seen service resolution times jump from 0.5ms to 50ms just because a dev team deployed too many microservices.

The solution is IPVS (IP Virtual Server). It uses hash tables (O(1)). It doesn't matter if you have 10 services or 10,000; the lookup time is effectively the same. It’s built into the Linux kernel and has been battle-tested for decades.

To enable this in your cluster (assuming you are using kubeadm), you need to edit the kube-proxy config map:

kubectl -n kube-system edit configmap/kube-proxy

Look for the mode setting and change it:

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"  # Change from "iptables" or ""
ipvs:
  strictARP: true
  scheduler: "rr" # Round Robin

Note: You must ensure the IPVS kernel modules (ip_vs, ip_vs_rr, ip_vs_wrr, ip_vs_sh) are loaded on the host node before restarting kube-proxy.

Kernel Tuning for High-Load Networking

Kubernetes defaults are safe, not fast. When you expect thousands of connections per second, the Linux kernel's conntrack table will fill up, and packets will drop silently. This is often misdiagnosed as "network connectivity issues" or "VPS provider packet loss." It’s not. It’s your kernel saying "I'm full."

On a CoolVDS instance, you have full root access to tune these parameters. Add this to /etc/sysctl.d/k8s.conf:

# Increase the connection tracking table size
net.netfilter.nf_conntrack_max = 1000000

# Reduce the time connections stay in TIME_WAIT
net.ipv4.tcp_fin_timeout = 15

# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65000

# Enable TCP BBR congestion control (Available in Kernel 4.9+)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

Run sysctl -p to apply. BBR is particularly effective for users connecting from mobile networks across Europe.

Ingress and The Schrems II Reality

July 2020 changed everything with the CJEU's Schrems II ruling. If you are a Norwegian company processing EU citizen data, using a US-controlled Ingress controller (like an AWS ALB) is now a compliance risk. The US CLOUD Act allows US authorities to subpoena that data.

The pragmatic architecture for 2020 is hosting directly on Norwegian soil using a standard NGINX Ingress Controller. You keep the SSL termination and the data within the jurisdiction of Datatilsynet.

Here is a hardened nginx-configuration ConfigMap to mitigate slow-loris attacks and handle large payloads:

apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-configuration
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
data:
  proxy-body-size: "50m"
  proxy-connect-timeout: "10"
  proxy-read-timeout: "120"
  proxy-send-timeout: "120"
  client-body-buffer-size: "16k"
  ssl-protocols: "TLSv1.2 TLSv1.3"
  use-forwarded-headers: "true"

The Hardware Reality: Latency to Oslo

You can optimize software all day, but you cannot beat the speed of light. If your users are in Oslo or Bergen, hosting in Frankfurt adds ~20-30ms of round-trip latency. That sounds small until you realize a modern web app makes 50 requests to load the DOM. That latency compounds.

Destination Source: Oslo Source: Trondheim
CoolVDS (Oslo DC) < 2 ms ~ 8 ms
AWS (Frankfurt) ~ 25 ms ~ 32 ms
DigitalOcean (Amsterdam) ~ 18 ms ~ 26 ms

Low latency isn't just about user experience; it's about database replication consistency and Etcd stability. Etcd (the brain of Kubernetes) is extremely sensitive to disk fsync latency. This is why we use high-performance NVMe storage on CoolVDS. Spinning disks or network-attached storage (like generic block storage) often introduce fsync latency spikes that cause K8s leader elections to fail, crashing your control plane.

Network Policies: The Firewall Inside

By default, K8s allows all pods to talk to all other pods. If an attacker breaches your frontend, they have a direct line to your database. Use Network Policies to lock this down.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: default
spec:
  podSelector: {} # Selects all pods in the namespace
  policyTypes:
  - Ingress
  - Egress

Apply this, then specifically whitelist only the traffic you need. It’s painful to set up, but it stops lateral movement dead in its tracks.

Conclusion

Kubernetes networking is complex, but it forces you to understand your traffic flow. In late 2020, relying on default settings is negligence. You need IPVS for scale, Calico for routing efficiency, and local hosting to keep the lawyers happy.

You need a platform that doesn’t treat CPU steal time as a feature. For your next cluster, check your `iowait` and your ping times. If they aren't near zero, you're building on sand.

Ready to build a compliant, low-latency cluster? Spin up a CoolVDS NVMe instance in Oslo today and see what single-digit latency does for your API response times.