Kubernetes Networking Deep Dive: Escaping the Overlay Tax

I recently spent 48 hours debugging a Kafka cluster on Kubernetes that was dropping packets faster than I drop poorly written pull requests. The culprit wasn't the JVM, and it wasn't the disk. It was the network overlay. When you stack a virtual network (Kubernetes) on top of a virtual machine (your VPS) which sits on a physical network, you are paying a "packet tax" on every single byte.

If you are running Kubernetes in production in 2019, you cannot treat the network as a black box. Understanding the interaction between your CNI (Container Network Interface) and the underlying Linux kernel is what separates a stable cluster from one that times out every time traffic spikes.

The CNI Battlefield: Flannel vs. Calico

When you initialize a cluster with kubeadm init, you have a choice. For years, Flannel was the default choice because it’s simple. It creates a VXLAN overlay—essentially wrapping Layer 2 packets inside Layer 3 UDP packets. It works everywhere, but it eats CPU cycles for breakfast due to encapsulation/decapsulation overhead.

For high-performance workloads, specifically here in the Nordic region where latency to the NIX (Norwegian Internet Exchange) matters, I almost exclusively recommend Calico. Calico can run in pure Layer 3 mode using BGP (Border Gateway Protocol), routing packets without the heavy encapsulation overhead, provided your underlying network supports it.

Here is how we typically deploy Calico 3.7 (the current stable release) on a fresh cluster:

kubectl apply -f https://docs.projectcalico.org/v3.7/manifests/calico.yaml

However, simply applying the YAML isn't enough. You need to verify that the IP pools are configured to match your Pod CIDR. If you are running on a cloud provider or a VPS where you don't control the physical routers, Calico defaults to IPIP (IP-in-IP) mode. This is still an overlay, but often lighter than VXLAN.

Pro Tip: If you are seeing high latency, check the MTU (Maximum Transmission Unit). The default Ethernet MTU is 1500. If you wrap a packet (overlay), the inner packet must be smaller. If your CNI tries to push 1500 bytes through a tunnel that adds headers, you trigger packet fragmentation. That kills performance.

Tuning the MTU for Calico

Inside your calico-config ConfigMap, ensure the MTU accounts for the encapsulation header. For IPIP, we usually set this to 1480.

kind: ConfigMap
apiVersion: v1
metadata:
  name: calico-config
  namespace: kube-system
data:
  veth_mtu: "1480"

Kernel Tuning: The Forgotten Optimization

Kubernetes relies heavily on iptables (or IPVS if you are living on the edge with K8s 1.11+). When you have thousands of Services and Pods, the iptables ruleset grows massive. Every packet has to traverse this list. This is O(n) complexity. It hurts.

Regardless of your CNI, you need to tune the underlying Linux kernel on your nodes. Most default distro settings (Ubuntu 18.04 / CentOS 7) are tuned for desktop use, not high-throughput packet forwarding.

Here is the sysctl.conf baseline I apply to every Worker Node before it joins the cluster:

# /etc/sysctl.d/k8s-net.conf

# Increase the connection tracking table. 
# If this fills up, packets get dropped silently.
net.netfilter.nf_conntrack_max = 131072

# Reduce the time we hold onto closed connections
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 30

# Allow IP forwarding (Required for K8s)
net.ipv4.ip_forward = 1

# Maximize the backlog for incoming packets
net.core.netdev_max_backlog = 5000

Apply these with sysctl -p /etc/sysctl.d/k8s-net.conf. If you skip this, your Ingress controller will choke under load, and you will blame the software when it is actually the OS limits.

Debugging in the Trenches

When a service isn't reachable, "it's DNS" is usually the answer. But when it's not DNS, you need to see the traffic. The challenge in 2019 is that many container images are stripped down (Alpine, Distroless) and don't have tools.

I recommend keeping a "netshoot" style pod manifest handy to join the network namespace of a troubled pod.

apiVersion: v1
kind: Pod
metadata:
  name: net-debug
  namespace: default
spec:
  containers:
  - name: net-debug
    image: nicolaka/netshoot
    command: ["/bin/bash"]
    stdin: true
    tty: true

Once inside, use tcpdump to verify if the SYN packets are actually arriving:

tcpdump -i eth0 port 80 -n -vv

The Hardware Reality: Why CoolVDS Matters

You can tune sysctl all day, but software-defined networking (SDN) is CPU intensive. Every time a packet is encapsulated, routed, and decapsulated, the CPU has to do work. On a shared hosting environment or a cheap VPS, you are often fighting for CPU time with "noisy neighbors."

If your CPU "steal" time (%st in top) is high, your network latency will jitter. This is why for production Kubernetes, we only use CoolVDS instances. They utilize KVM virtualization which provides strict resource isolation. Unlike container-based VPS solutions (like OpenVZ) where the kernel is shared, KVM gives your node its own kernel to tune.

Furthermore, CoolVDS infrastructure is built on NVMe storage. While we are talking about networking, remember that etcd (the K8s brain) is extremely sensitive to disk write latency. If etcd is slow because of cheap spinning disks, the API server lags, and network updates (like new Endpoints) get delayed.

Data Sovereignty and Latency

For those of us operating in Norway, the legal landscape is tightening. With the GDPR now fully enforceable, keeping data within national borders is a significant compliance advantage. Hosting on CoolVDS ensures your data stays in local data centers, subject to Norwegian law and Datatilsynet regulations, not hidden away in a generic "EU-West" zone.

Plus, physics is physics. Pinging a server in Oslo from Oslo takes 2ms. Pinging Frankfurt takes 25ms+. For microservices databases, that round-trip time adds up.

Final Thoughts

Kubernetes networking is complex, but it is manageable if you respect the overhead. Choose the right CNI, tune your kernel connection tracking, and never underestimate the value of high-performance underlying infrastructure.

Don't let I/O wait times or CPU steal kill your cluster's performance. Spin up a KVM-based, NVMe-backed instance on CoolVDS and give your packets the lane they deserve.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

Kubernetes Networking Deep Dive: Escaping the Overlay Tax

Kubernetes Networking Deep Dive: Escaping the Overlay Tax

The CNI Battlefield: Flannel vs. Calico

Tuning the MTU for Calico

Kernel Tuning: The Forgotten Optimization

Debugging in the Trenches

The Hardware Reality: Why CoolVDS Matters

Data Sovereignty and Latency

Final Thoughts

/// RELATED POSTS

Edge Computing in Norway: Architecting for Sub-5ms Latency in 2025

Kubernetes Networking Deep Dive: Optimizing Packet Flow for Low Latency in 2025

Surviving the Packet Storm: A Deep Dive into Kubernetes Networking & CNI Performance in 2025

Surviving the Millisecond War: Edge Computing Architectures for the Nordic Market

Kubernetes Networking Deep Dive: Why Your Packets Are Dropping in the Overlay

Serverless Without the Handcuffs: Implementing Private FaaS Patterns on High-Performance VDS in 2025