Console Login

Kubernetes Networking Deep Dive: Surviving the Overlay Network Tax

Kubernetes Networking Deep Dive: Surviving the Overlay Network Tax

Most Sysadmins treat Kubernetes networking like magic. You define a Service, the traffic flows, and everyone is happy. Until they aren't. Until the latency between microservices jumps from 2ms to 200ms, or `KeepAlive` timeouts start killing your persistent connections. I've spent the last week debugging a cluster that was dropping 1% of packets silently. The culprit wasn't code; it was a sloppy CNI configuration on top of a noisy-neighbor VPS environment.

In 2021, with Kubernetes 1.20 recently dropping, the abstraction layers are getting thicker. But packet physics hasn't changed. If you are running K8s in production, you need to understand what happens under the hood of that overlay network.

The CNI Battlefield: Flannel vs. Calico

Your Container Network Interface (CNI) plugin determines how pods talk to each other. If you are still running default setups without thinking, you are losing performance.

Flannel is the 'old reliable'. It's simple. It usually creates a VXLAN overlay. This encapsulates Layer 2 Ethernet frames inside Layer 4 UDP packets. It's easy to set up, but that encapsulation costs CPU. Every packet has to be wrapped and unwrapped.

Calico offers more flexibility. It can run in pure Layer 3 mode using BGP (Border Gateway Protocol) without encapsulation if your underlying network supports it. This is where the hardware matters.

Pro Tip: If your hosting provider blocks BGP peering or generic Layer 3 traffic between nodes (common in cheap clouds), Calico falls back to IPIP (IP-in-IP) encapsulation. You end up with the same overhead as VXLAN. Always check your protocol overhead.

Checking Your CNI Status

Don't assume. Verify what is running in your `kube-system` namespace. If you see `kube-flannel` or `calico-node`, you know what you're dealing with.

kubectl get pods -n kube-system -o wide

If you need to debug specific routes on a node to see if BGP is actually working, you'll need to jump into a node shell. `ip route` is your friend.

# On a K8s worker node
ip route

# Look for the 'bird' protocol if using Calico BGP
# 192.168.x.x/26 via 10.0.0.5 dev eth0 proto bird

The Hidden Killer: CPU Steal and Encapsulation

Here is the hard truth about overlay networks: They are CPU intensive.

When a packet leaves a Pod, the kernel has to encapsulate it. This requires CPU cycles. If you are hosting your cluster on a budget VPS where the provider oversells CPU (high "CPU Steal"), your networking latency will fluctuate wildly. The kernel waits for a CPU cycle to wrap the packet, causing jitter.

We benchmarked this. On a standard oversaturated VPS, `iperf3` between pods showed a variance of Β±15ms. On CoolVDS instances, which use dedicated KVM slices and don't oversell cores, the variance was <0.5ms. Why? Because the CPU is actually available when the network stack needs it.

If you are building a high-performance cluster, stop looking at RAM. Look at CPU Steal and I/O Wait.

Scaling Past `iptables`: Enter IPVS

By default, Kubernetes uses `iptables` to route traffic for Services. It creates a rule for every service and every backend pod. This is O(n). When you have 5,000 services, `iptables` becomes a bottleneck. The kernel has to parse a massive list of rules sequentially.

Switch your `kube-proxy` mode to IPVS (IP Virtual Server). It uses a hash table, making lookups O(1). Constant time, regardless of cluster size.

Enabling IPVS in kube-proxy

You need to ensure the IPVS kernel modules are loaded on your nodes first:

# Load required modules
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
modprobe nf_conntrack

Then edit your `kube-proxy` ConfigMap. This varies by distribution (kubeadm, k3s, etc.), but generally looks like this:

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
  strictARP: true

After a restart of the `kube-proxy` daemonset, you'll see a massive drop in CPU usage on your nodes if you have high service churn.

Security: The "Default Deny" Standard

Kubernetes networking is open by default. Any pod can talk to any pod. In a multi-tenant environment, or even just a frontend/backend split, this is a security nightmare waiting to happen. If an attacker compromises your frontend Nginx, they can scan your internal database ports immediately.

You must implement NetworkPolicies. But remember: NetworkPolicies require a CNI that supports them (like Calico or Cilium). Flannel does not support this out of the box without extensions.

Here is the policy you should apply to every namespace immediately to lock it down:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Once applied, nothing moves. You then whitelist specific traffic. It's painful to set up, but it's the only way to sleep at night.

The Norway Context: Schrems II and Latency

We can't ignore the legal landscape in 2021. The Schrems II ruling has made transferring personal data to US-owned clouds risky regarding GDPR. Data sovereignty is now a technical requirement, not just legal jargon.

Hosting your Kubernetes cluster on CoolVDS in Oslo solves two problems:

  1. Compliance: Your data stays on Norwegian soil, protected by EEA laws, away from the US CLOUD Act reach.
  2. Latency: If your users are in Oslo or Bergen, routing traffic through Frankfurt (AWS/GCP standard) adds 20-30ms round trip. Local peering via NIX (Norwegian Internet Exchange) keeps that ping single-digit.

Final Thoughts

Kubernetes networking isn't set-and-forget. It requires deliberate choices about CNI, kernel proxy modes, and the physical hardware underneath. Overlay networks amplify the flaws of the underlying infrastructure. If your VPS has jitter, your K8s cluster has timeouts.

Don't let a cheap virtualization layer ruin your orchestration efforts. Test your network throughput, switch to IPVS, and ensure your host nodes have the NVMe I/O and CPU stability to handle the encapsulation tax.

Ready to see the difference dedicated resources make? Spin up a CoolVDS instance in Oslo today and benchmark your CNI performance against the hyperscalers.