Console Login

Kubernetes Networking Deep Dive: Solving the Latency & MTU Nightmares in 2024

Kubernetes Networking Deep Dive: Solving the Latency & MTU Nightmares

Most developers treat Kubernetes networking as an abstraction layer they hope never to touch. You define a Service, maybe an Ingress, and pray the packets find their way. But when you are running high-throughput workloads or dealing with strict SLA requirements in the Nordic market, hoping is not a strategy. I have seen production clusters in Oslo grind to a halt not because of CPU limits, but because of poor CNI choices and overlooked MTU fragmentation.

If you are deploying on bare metal or high-performance VPS environments like CoolVDS, you don't have the luxury (or the hidden costs) of an AWS VPC controller managing everything for you. You have to understand how the plumbing works. Today, we are dissecting the packet flow, debating Calico vs. Cilium in the era of eBPF, and fixing the configuration errors that are likely killing your latency right now.

The CNI Battlefield: IPTables vs. eBPF

In January 2024, if you are still relying on pure iptables-based routing for large clusters, you are voluntarily adding latency. As services scale, the sequential list of iptables rules grows, turning every packet look-up into an O(n) complexity nightmare.

For modern deployments, specifically those targeting low-latency requirements in Scandinavia, Cilium (leveraging eBPF) has become the de facto standard over standard Flannel or older Calico modes. It bypasses much of the host networking stack, injecting logic directly into the kernel.

Pro Tip: When migrating legacy clusters, do not just swap the CNI. You need to ensure the underlying kernel on your nodes supports at least Linux 5.10+ to get the full benefit of eBPF without quirks. CoolVDS images are standardized on modern kernels for this exact reason.

Verifying Your Current CNI Mode

Before you optimize, you must know what is running. Execute this on your control plane:

kubectl get pods -n kube-system -l k8s-app=cilium

If you are running Cilium, check the status of the eBPF maps and tunnel modes:

kubectl -n kube-system exec -ti cilium-xxxxx -- cilium status --verbose

Configuration: Enabling Hubble for Visibility

Blindly trusting the network is foolish. We use Hubble (part of Cilium) to visualize packet drops. Here is a production-ready Helm configuration for enabling Hubble with strict metrics, suitable for monitoring latency spikes to NIX (Norwegian Internet Exchange):

helm install cilium cilium/cilium --version 1.14.5 \n  --namespace kube-system \n  --set kubeProxyReplacement=true \n  --set hubble.relay.enabled=true \n  --set hubble.ui.enabled=true \n  --set prometheus.enabled=true \n  --set operator.replicas=1 \n  --set k8sServiceHost=API_SERVER_IP \n  --set k8sServicePort=6443

The Silent Killer: MTU and Encapsulation

This is where I see 90% of self-managed clusters fail. By default, many CNIs use VXLAN or Geneve encapsulation. These protocols add headers to your packets. If your physical interface (eth0) has an MTU of 1500 and your CNI tries to push 1500 bytes plus a 50-byte VXLAN header, the packet fragments. Fragmentation destroys performance.

To fix this, you must calculate the overhead. For VXLAN, you typically need an MTU of 1450 inside the pod.

Check your node's physical MTU first:

ip addr show eth0 | grep mtu

If you are on CoolVDS, our infrastructure supports Jumbo Frames in specific zones, but standard internet traffic is capped at 1500. Configure your CNI config map to explicitly set the MTU:

apiVersion: v1\nkind: ConfigMap\nmetadata:\n  name: cni-configuration\n  namespace: kube-system\ndata:\n  cni-config: |\n    {\n      "name": "k8s-pod-network",\n      "cniVersion": "0.3.1",\n      "plugins": [\n        {\n          "type": "calico",\n          "mtu": 1450,\n          "log_level": "info",\n          "datastore_type": "kubernetes",\n          "nodename": "__KUBERNETES_NODE_NAME__",\n          "ipam": {\n            "type": "calico-ipam"\n          }\n        }\n      ]\n    }

Network Policies: The "Zero Trust" Reality

Norwegian data protection laws (and GDPR) require strict access control. By default, Kubernetes allows all pod-to-pod traffic. This is a security disaster waiting to happen. If one pod is compromised, the attacker has lateral movement across your entire cluster.

We implement a "Default Deny" policy immediately after cluster creation. This forces you to whitelist traffic explicitly.

apiVersion: networking.k8s.io/v1\nkind: NetworkPolicy\nmetadata:\n  name: default-deny-all\n  namespace: default\nspec:\n  podSelector: {}\n  policyTypes:\n  - Ingress\n  - Egress

Once this is applied, nothing works. Good. Now, allow only what is necessary. For example, allowing a frontend to talk to a backend redis cache:

apiVersion: networking.k8s.io/v1\nkind: NetworkPolicy\nmetadata:\n  name: allow-frontend-to-redis\n  namespace: default\nspec:\n  podSelector:\n    matchLabels:\n      app: redis\n  policyTypes:\n  - Ingress\n  ingress:\n  - from:\n    - podSelector:\n        matchLabels:\n          app: frontend\n    ports:\n    - protocol: TCP\n      port: 6379

Load Balancing on Bare Metal/VPS

When you aren't on a hyperscaler, you don't get a magic LoadBalancer IP. You need MetalLB. This is critical for deployments on CoolVDS where you control the IPs. MetalLB allows your K8s services to announce external IPs via ARP (Layer 2) or BGP.

For a standard single-cluster setup in our Oslo datacenter, Layer 2 mode is sufficient and robust.

Check ARP responsiveness:

arp -n

Configuration for MetalLB Layer 2 mode:

apiVersion: metallb.io/v1beta1\nkind: IPAddressPool\nmetadata:\n  name: first-pool\n  namespace: metallb-system\nspec:\n  addresses:\n  - 192.168.1.240-192.168.1.250\n---\napiVersion: metallb.io/v1beta1\nkind: L2Advertisement\nmetadata:\n  name: example\n  namespace: metallb-system

Why Infrastructure Matters: The Noisy Neighbor Problem

You can tune your CNI and MTU perfectly, but if the underlying hypervisor is stealing CPU cycles, your network latency will jitter. Packet processing is CPU intensive, especially with encryption or complex iptables rules. Container-based VPS solutions (like LXC/OpenVZ) often suffer from "noisy neighbor" effects where another tenant's database query kills your packet throughput.

This is why CoolVDS utilizes KVM virtualization with dedicated resource allocation. When you run a Kubernetes node on our NVMe instances, the CPU cycles assigned to packet switching are yours alone. In benchmarks conducted locally in Oslo, we see a stable 0.4ms ping to major local ISPs, whereas oversold shared hosting often fluctuates between 2ms and 50ms unpredictably.

Final Check: Troubleshooting Throughput

If you suspect the network is still slow, run `iperf3` between two pods on different nodes. Do not rely on HTTP latency, which includes application overhead.

kubectl run -it --rm iperf-server --image=networkstatic/iperf3 -- -s
kubectl run -it --rm iperf-client --image=networkstatic/iperf3 -- -c <server-ip>

Consistent networking requires consistent infrastructure. Stop fighting the hypervisor and start optimizing your logic. If your current host can't guarantee stable I/O and CPU for your CNI, it is time to move.

Ready to stabilize your production workloads? Deploy a KVM-based instance on CoolVDS today and see the difference raw performance makes to your packet flow.