Console Login

Kubernetes Networking Deep Dive: Surviving the Packet Jungle in 2022

Kubernetes Networking Deep Dive: Surviving the Packet Jungle in 2022

The network is always the problem. It’s the first rule of systems engineering, and it goes double for Kubernetes. When a pod enters a CrashLoopBackOff state, developers blame the code, but us ops people check the DNS resolution first. If you are running Kubernetes on bare metal or VPS instances—which you should be if you care about performance per krone—you don't have the luxury of an AWS VPC abstraction hiding the mess. You own the packets.

In this deep dive, we are stripping away the magic. We will look at CNI selection, the silent killer known as MTU mismatch, and how to actually debug traffic when kubectl logs tells you nothing. We build infrastructure that respects physics and compliance, specifically looking at the nuances of hosting in Norway.

The CNI Decision: BGP vs. VXLAN in 2022

Choosing a Container Network Interface (CNI) is often treated as an afterthought. It shouldn't be. In 2022, we largely see three contenders for serious production workloads: Calico, Flannel, and the rising star, Cilium.

If you are deploying on CoolVDS instances, you have direct Layer 2 isolation. This means you can run BGP (Border Gateway Protocol) if you want to advertise pod IPs directly to your fabric, or you can stick to VXLAN for encapsulation. For most setups, Calico with VXLAN is the robust standard, while Cilium is leveraging eBPF for high-performance filtering.

Here is a battle-tested Calico IPPool configuration. Note the vxlanMode and blockSize. A common mistake is allocating blocks that are too small for high-density nodes.

apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  name: default-ipv4-ippool
spec:
  cidr: 192.168.0.0/16
  ipipMode: Never
  vxlanMode: Always
  natOutgoing: true
  blockSize: 26
  nodeSelector: all()
Pro Tip: Avoid ipipMode: Always unless you absolutely must cross Layer 3 boundaries without VXLAN support. IPIP encapsulation adds overhead and can trigger packet fragmentation if your underlying network MTU isn't tuned. On CoolVDS, we support full VXLAN offloading on the NIC, so stick to VXLAN.

The Silent Killer: MTU Mismatches

I have spent entire weekends debugging APIs that would hang on large POST requests but work fine for health checks. The culprit is almost always the Maximum Transmission Unit (MTU). The standard internet MTU is 1500 bytes. If your CNI wraps packets in VXLAN headers (typically 50 bytes), your inner pod interface must have an MTU of 1450.

If your pod tries to send a 1500-byte packet inside a VXLAN tunnel, the outer packet becomes 1550 bytes. The physical host interface will drop it. No error message. Just a timeout.

Check your current settings on a node:

ip -d link show eth0 | grep mtu

If you are managing your own cluster on our VPS infrastructure, explicitly set the MTU in your CNI config map. For Calico, it looks like this in the custom-resources.yaml:

apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  calicoNetwork:
    mtu: 1450
    ipPools:
    - cidr: 192.168.0.0/16
      encapsulation: VXLAN

Service Exposure: Life Without Cloud Load Balancers

On GKE or EKS, you create a Service of type LoadBalancer and a magical IP appears. On a VPS setup, you see <pending> forever. You need MetalLB. In 2022, MetalLB is the de-facto standard for bare-metal load balancing. It allows your Kubernetes nodes to announce the Service IP via ARP (Layer 2) or BGP.

Layer 2 mode is easiest for small clusters. One node attracts the traffic and kube-proxy spreads it. However, if that node dies, failover takes a few seconds. For mission-critical banking apps or high-availability shops in Oslo, we recommend BGP mode, but ARP suffices for 99% of web workloads.

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: first-pool
  namespace: metallb-system
spec:
  addresses:
  - 192.168.10.0/24
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: example
  namespace: metallb-system

Debugging: When the Packet Disappears

kubectl exec is great, but sometimes you need to see the raw wire. Installing tcpdump inside every container image is a security risk and bloats your registry. Instead, use nsenter to debug from the node level. This allows you to step into the pod's network namespace using the tools available on the host (which CoolVDS images come pre-equipped with).

First, find the Process ID (PID) of the container:

crictl inspect --output go-template --template '{{.info.pid}}' <container-id>

Then, hop into its namespace and capture traffic:

nsenter -t <PID> -n tcpdump -i eth0 -nn port 80

This technique separates the juniors from the seniors. You see exactly what the application sees, without altering the container artifact.

Security: The Default is Not Safe

By default, Kubernetes allows all pod-to-pod traffic. In a shared environment, this is terrifying. If one service gets compromised (say, a Log4j vulnerability we all remember from late 2021), the attacker can pivot to your database. You must use NetworkPolicy.

Here is a "Deny All" policy that should be the baseline for every namespace you deploy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: default
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

You then whitelist specific paths. Yes, it's tedious. But when the auditors from Datatilsynet come knocking about GDPR compliance and data segregation, you will be glad you did it.

The Infrastructure Factor

You can have the perfect CNI config, but if the underlying hypervisor steals CPU cycles or bottlenecks I/O, your K8s networking stack will stutter. Network packet processing in K8s is CPU intensive (iptables rules, encapsulation/decapsulation).

This is where CoolVDS differs from the budget providers. We don't oversubscribe CPU on our KVM nodes. When your Ingress controller needs to handle a burst of 10,000 requests, the CPU cycles are there. Furthermore, our datacenter in Oslo connects directly to NIX (Norwegian Internet Exchange), ensuring that latency to your local users remains in the single digits (often <5ms). For distributed databases running on K8s, like CockroachDB or heavy MySQL clusters, that low latency is the difference between a commit and a timeout.

Final Thoughts

Kubernetes networking is complex, but it obeys logical rules. Control your MTU, secure your traffic with policies, and ensure your underlying compute—the VPS itself—is up to the task. Do not let a noisy neighbor on a cheap shared host wreck your cluster's stability.

Ready to build a cluster that doesn't flake? Deploy a high-performance NVMe instance on CoolVDS today and get root access in under 60 seconds.