Kubernetes Networking Explained: CNI, IPVS, and Debugging Production Clusters

Let’s be honest: Kubernetes networking is usually the layer that keeps us up at night. The scheduling logic is elegant, but the networking stack—pod-to-pod communication, service discovery, and ingress—is a complex beast of iptables rules, routing tables, and encapsulation. I recently spent three days debugging a microservices setup for a client in Oslo where random packets were dropping between the frontend and the payment gateway. The culprit wasn't code; it was a saturated conntrack table on the host nodes.

If you treat Kubernetes networking as a black box, you will eventually face an outage you can't explain. Today, we are tearing that box open. We will look at choosing the right CNI (Container Network Interface), the performance implications of the new IPVS mode in kube-proxy, and why your choice of underlying VPS provider in Norway impacts your overlay network overhead more than you think.

The CNI Jungle: Flannel vs. Calico in 2018

When you initialize a cluster with kubeadm init, you aren't done until you apply a CNI plugin. In 2018, the two heavyweights are Flannel and Calico. Your choice here dictates your network performance.

Flannel is the simple choice. It typically uses VXLAN to encapsulate Layer 2 Ethernet frames within Layer 3 UDP packets. It works everywhere, but that encapsulation comes with a CPU cost. Every packet leaving a pod is wrapped, sent across the wire, and unwrapped. On a high-traffic node, this CPU overhead adds up.

Calico, on the other hand, can run in pure Layer 3 mode using BGP (Border Gateway Protocol). No encapsulation headers, just pure routing. If your underlying infrastructure supports it, this is the performance winner.

Pro Tip: If you are running on a provider that blocks BGP or filters unknown MAC addresses (common in cheap shared clouds), you might be forced into VXLAN. CoolVDS KVM instances provide the isolation needed to run these protocols without the "noisy neighbor" interference that plagues standard shared hosting.

Configuring Calico for Policy Enforcement

One major reason we lean towards Calico at the moment is support for NetworkPolicy. Flannel (by default) just connects things; Calico secures them. With GDPR now in full effect as of May, you cannot have a database pod accepting connections from just anywhere in the cluster.

Here is a standard NetworkPolicy to deny all ingress traffic by default—a configuration that should be standard in every namespace you deploy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress

Once applied, you explicitly whitelist traffic. This follows the "Zero Trust" model that Datatilsynet (The Norwegian Data Protection Authority) effectively mandates for sensitive data handling.

The Shift to IPVS (IP Virtual Server)

This is the most exciting development in Kubernetes 1.11 (stable since June). Traditionally, kube-proxy used iptables to handle Service VIPs. When you have a service, kube-proxy writes iptables rules to redirect traffic to the backend pods.

The problem? iptables is a linear list. If you have 5,000 services, the kernel has to traverse a massive list of rules for every packet. O(n) complexity kills performance at scale.

IPVS is a kernel-space load balancer based on hash tables. It has O(1) complexity. It doesn't care if you have 5 services or 5,000; the lookup time is virtually the same. To enable this in your cluster, you need to ensure the IPVS kernel modules are loaded on your worker nodes before starting kube-proxy.

# Load required kernel modules
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4

# Check if they are loaded
lsmod | grep -e ip_vs -e nf_conntrack_ipv4

If you are managing your own control plane, you then configure kube-proxy with mode: "ipvs". The latency difference in high-churn environments is noticeable.

The Hardware Reality: Latency and Virtualization

Software optimization implies that the hardware underneath is reliable. This is where many DevOps engineers fail. They spend weeks tuning sysctl.conf parameters but deploy on oversold VPS hosts.

Kubernetes adds layers: Pod -> CNI -> Host Network -> Physical Interface. If you are using VXLAN, you are adding packet fragmentation risks and CPU overhead for encapsulation. If the physical host underneath your VM is stealing CPU cycles (Steal Time) because the hosting provider oversold the CPU, your network throughput crashes. It doesn't matter how good your Calico config is if the hypervisor isn't scheduling your VM's CPU instructions fast enough.

Feature	Standard Container VPS	CoolVDS (KVM)
Virtualization	Container-based (LXC/OpenVZ)	Hardware-assisted (KVM)
Kernel Access	Shared Kernel (Restricted)	Dedicated Kernel (Full Control)
IPVS Support	Often blocked	Native Support
IOPS Consistency	Fluctuates ( /// TAGS #Kubernetes #DevOps #Networking #Linux #Security #Performance /// RELATED POSTS Edge Computing in Norway: Architecting for Sub-5ms Latency in 2025 A technical deep-dive into deploying high-performance edge nodes in Oslo. We cover K3s clustering, M... Read More → Kubernetes Networking Deep Dive: Optimizing Packet Flow for Low Latency in 2025 Stop debugging random DNS timeouts. A battle-hardened guide to K8s networking, eBPF, Gateway API, an... Read More → Surviving the Packet Storm: A Deep Dive into Kubernetes Networking & CNI Performance in 2025 Default Kubernetes networking won't survive production traffic. We dissect the CNI wars (Cilium vs C... Read More → Surviving the Millisecond War: Edge Computing Architectures for the Nordic Market Centralized cloud regions in Frankfurt or Stockholm aren't enough for real-time Norwegian workloads.... Read More → Kubernetes Networking Deep Dive: Why Your Packets Are Dropping in the Overlay A battle-hardened guide to debugging Kubernetes networking issues, from CNI choices (Cilium vs Calic... Read More → Serverless Without the Handcuffs: Implementing Private FaaS Patterns on High-Performance VDS in 2025 Public cloud serverless functions are a billing trap disguised as convenience. Learn how to architec... Read More → ← Back to All Posts /// COOLVDS System Status: ● ONLINE Latency: 12ms (Global Avg) Version: 2025.4.1 Compute GPU Slices MicroVMs Solutions AI Inference Game Servers CI/CD Pipelines Company About Our Team Blog Pricing Contact Legal Privacy Policy Terms of Service Cookie Settings © 2026 CoolVDS.com. All systems operational. Privacy Policy \| Cookie Settings \| GDPR & CCPA Compliant

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

Kubernetes Networking Explained: CNI, IPVS, and Debugging Production Clusters

Kubernetes Networking Explained: CNI, IPVS, and Debugging Production Clusters

The CNI Jungle: Flannel vs. Calico in 2018

Configuring Calico for Policy Enforcement

The Shift to IPVS (IP Virtual Server)

The Hardware Reality: Latency and Virtualization

/// RELATED POSTS

Edge Computing in Norway: Architecting for Sub-5ms Latency in 2025

Kubernetes Networking Deep Dive: Optimizing Packet Flow for Low Latency in 2025

Surviving the Packet Storm: A Deep Dive into Kubernetes Networking & CNI Performance in 2025

Surviving the Millisecond War: Edge Computing Architectures for the Nordic Market

Kubernetes Networking Deep Dive: Why Your Packets Are Dropping in the Overlay

Serverless Without the Handcuffs: Implementing Private FaaS Patterns on High-Performance VDS in 2025