Console Login

Kubernetes Networking Deep Dive: Stop Packets from Dying in Your Overlay

Kubernetes Networking Deep Dive: Stop Packets from Dying in Your Overlay

It’s 3:00 AM. Your Prometheus alerts are firing. The latency on the checkout microservice just spiked to 500ms, but the CPU load is normal. You check the logs: Connection timed out.

Welcome to the hell that is Kubernetes networking. If you treat K8s networking as a black box, it will eventually break you. I’ve seen production clusters in Oslo brought to their knees not by code bugs, but by exhausted conntrack tables and poorly configured MTU settings.

In this deep dive, we are ignoring the marketing fluff. We are going to look at the plumbing: CNI choices, the shift from iptables to IPVS, and why your underlying infrastructure (the actual VPS) matters more than your YAML files.

The CNI Battlefield: Flannel vs. Calico

When you run kubeadm init, you make a choice that dictates your cluster's performance for years. The Container Network Interface (CNI) isn't just a plugin; it's the nervous system of your cluster. In 2019, if you are running on CoolVDS or any other provider offering raw KVM instances, you usually have two main contenders.

1. Flannel (The "It Just Works" Trap)

Flannel is great for a Raspberry Pi cluster in your basement. It creates a simple VXLAN overlay. But in production, simplicity costs you.

Flannel wraps every packet in UDP. This encapsulation overhead consumes CPU cycles. On a budget VPS with "shared" vCPUs, this effectively throttles your network throughput.

2. Calico (The Professional's Choice)

For serious workloads, we use Calico. It can run in IPIP mode (tunneling) or BGP mode (direct routing). If your nodes are on the same Layer 2 network (which CoolVDS provides within the same datacenter), you can turn off encapsulation entirely. This gives you near-metal network speeds.

Here is a snippet from a standard Calico 3.8 manifest configuration where we explicitly set the IP detection method. This is crucial when your VPS has multiple interfaces (e.g., private LAN and public WAN):

            - name: IP_AUTODETECTION_METHOD
              value: "interface=eth1"
            - name: CALICO_IPV4POOL_IPIP
              value: "Always" # Change to 'CrossSubnet' for better performance inside the LAN
Pro Tip: If your nodes are in the same CoolVDS zone, set CALICO_IPV4POOL_IPIP to CrossSubnet. Calico will route packets natively between nodes and only encapsulate when going across subnets. The latency difference is measurable—often cutting ping times by 30%.

The Bottleneck: iptables vs. IPVS

Prior to Kubernetes 1.11, kube-proxy relied almost exclusively on iptables to route traffic to Services. This works fine for 50 services. It is a disaster for 5,000.

Why? Because iptables is a sequential list. Every packet has to traverse the chain until it finds a match. It’s O(n) complexity. I recently audited a client's cluster that had 15,000 iptables rules. The kernel spent 20% of its time just figuring out where to send packets.

Enter IPVS (IP Virtual Server)

With Kubernetes v1.16 (the current stable standard), you should be using IPVS mode. IPVS is a kernel-space load balancer that uses hash tables (O(1) complexity). It doesn't care if you have 10 services or 10,000.

How to enable IPVS:

First, ensure the modules are loaded on your underlying Linux node:

sudo modprobe ip_vs
sudo modprobe ip_vs_rr
sudo modprobe ip_vs_wrr
sudo modprobe ip_vs_sh

Then, edit your kube-proxy config map:

kubectl edit configmap kube-proxy -n kube-system

Look for the mode field and change it:

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"  # Change this from "iptables" or ""

Kill the kube-proxy pods to restart them:

kubectl get pods -n kube-system | grep kube-proxy | awk '{print $1}' | xargs kubectl delete pod -n kube-system

You can verify it works by installing ipvsadm and checking the table. You should see your Service IPs listed:

# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.96.0.1:443 rr
  -> 192.168.1.50:6443            Masq    1      0          0         
TCP  10.96.0.10:53 rr
  -> 10.244.1.3:53                Masq    1      0          0         

The Silent Killer: MTU Mismatches

If you see random timeouts on large POST requests but ICMP (ping) works fine, you have an MTU problem.

The standard Ethernet MTU is 1500 bytes. When you use an overlay network like VXLAN or IPIP, the CNI adds headers to the packet. If the total size exceeds 1500 bytes, the physical interface fragments the packet. Fragmentation destroys performance.

The Fix:

  • Physical Interface: 1500 MTU
  • VXLAN/IPIP Interface: Must be lower (usually 1450 or 1480).

We configure our CoolVDS KVM templates with standard 1500 MTU on the WAN interface, but we support Jumbo Frames (9000 MTU) on the private VLANs if you request it. This allows your overlay network to run a full 1500 MTU payload without fragmentation, provided your physical link supports it.

Why Infrastructure Matters (The CoolVDS Factor)

You can tune sysctl.conf until you are blue in the face, but you cannot software-optimize a noisy neighbor.

Kubernetes networking is CPU-intensive. Encapsulation, iptables/IPVS routing, and SSL termination at the Ingress controller all require consistent CPU cycles. If you are hosting on a budget "container-optimized" platform where resources are heavily oversold, your network latency will fluctuate wildly.

At CoolVDS, we don't play those games. We offer:

  1. KVM Virtualization: Real kernel isolation. No "container-in-container" mess.
  2. NVMe Storage: Because etcd (the K8s brain) is extremely sensitive to disk write latency. Slow disk = API server timeout.
  3. Norwegian Data Residency: With GDPR enforcement ramping up, keeping your data in Oslo isn't just about latency to the NIX (Norwegian Internet Exchange)—it's about legal compliance.

Benchmarking Network Throughput

Don't take my word for it. Run iperf3 between two pods on different nodes.

# Server Pod
iperf3 -s

# Client Pod
iperf3 -c [SERVER_IP] -t 30

On a standard CoolVDS instance with Calico (IPIP), we consistently see near-line rates with sub-millisecond variance. On cheap shared hosting, you will see the "TCP Retransmission" count climb.

Final Thoughts

Kubernetes is a beast. Taming the network requires understanding the layers below the YAML. By switching to IPVS, correctly sizing your MTU, and choosing a CNI that fits your topology, you can build a cluster that survives the Black Friday traffic spike.

But software is only half the battle. You need a foundation that respects your need for raw I/O and stable CPU time.

Stop debugging network ghosts. Spin up a 3-node K8s cluster on CoolVDS today and see what stable latency actually looks like.