Console Login

Kubernetes Networking Deep Dive: Packet Flow, CNI, and Debugging IPVS in 2019

Kubernetes Networking: Where Packets Go to Die (And How to Save Them)

If I had a krone for every time a developer told me "it's a network issue" when their application threw a 502, I could retire to a cabin in Hemsedal. But usually, in the world of Kubernetes, they aren't entirely wrong. Networking in K8s is an abstraction on top of an abstraction, wrapped in iptables rules that would make a seasoned sysadmin weep.

We are running Kubernetes 1.13 in production. It is stable, it is powerful, but the networking model assumes a flat network where every pod can talk to every other pod. Achieving this overlay network requires choosing the right CNI (Container Network Interface) and understanding how `kube-proxy` actually moves bits around. If you are serving customers in Oslo or Bergen, relying on default settings is a recipe for latency spikes.

The CNI Battlefield: Flannel vs. Calico

When you initialize `kubeadm`, you make a choice that defines your cluster's topology. In early 2019, the two main contenders for most setups are Flannel and Calico.

Flannel is the simple choice. It creates a VXLAN overlay. It encapsulates packets, which adds CPU overhead. If you are running on cheap, oversold VPS hosting where CPU steal is high, VXLAN processing will kill your throughput.

Calico uses BGP (Border Gateway Protocol) to route packets without encapsulation (in most modes). It is faster, but more complex. It allows for Network Policies, which are critical if you want to be compliant with strict security standards.

Pro Tip: If you are deploying on CoolVDS, you have the dedicated CPU cycles to handle VXLAN, but why waste them? We recommend Calico for the BGP routing capabilities, allowing raw packet speed closer to the metal. Because our instances are KVM-based, you don't fight neighbors for network interrupts.

Configuring Calico for MTU Sanity

A common mistake I see in Nordic datacenters is MTU mismatch. If your physical network has an MTU of 1500, and your VXLAN adds headers, your inner packet must be smaller. If not: fragmentation. Fragmentation is the enemy of performance.

Here is how we patch the `calico-config` ConfigMap to ensure we aren't dropping packets on the floor:

kind: ConfigMap
apiVersion: v1
metadata:
  name: calico-config
  namespace: kube-system
data:
  # Typha is disabled.
  typha_service_name: "none"
  # Configure the MTU. 
  # Standard Ethernet is 1500. If using IPIP encapsulation, subtract 20 bytes.
  veth_mtu: "1480"
  # The CNI network configuration.
  cni_network_config: |-
    {
      "name": "k8s-pod-network",
      "cniVersion": "0.3.0",
      "plugins": [
        {
          "type": "calico",
          "log_level": "info",
          "datastore_type": "kubernetes",
          "nodename": "__KUBERNETES_NODE_NAME__",
          "mtu": __CNI_MTU__,
          "ipam": {
              "type": "calico-ipam"
          },
          "policy": {
              "type": "k8s"
          },
          "kubernetes": {
              "kubeconfig": "__KUBECONFIG_FILEPATH__"
          }
        }
      ]
    }

The Service Layer: Goodbye iptables, Hello IPVS

For years, `kube-proxy` relied on `iptables` to load balance traffic to Services. This works fine for 50 services. But when you scale to 5,000 services, the sequential list of iptables rules becomes a bottleneck. Updating the rules takes too long, causing blips in connectivity during deployments.

As of Kubernetes 1.11, IPVS (IP Virtual Server) is generally available. It uses a hash table instead of a linear list. It is O(1) complexity. If you are serious about performance, you need to enable strict ARP mode and switch to IPVS.

Check what your proxy is currently doing:

# Check kube-proxy logs to see the mode
kubectl logs -n kube-system -l k8s-app=kube-proxy | grep "Using ipvs ProxyMode"

If you don't see IPVS, you need to edit your `kube-proxy` config map or daemonset arguments:

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
  scheduler: "rr" # Round Robin
  strictARP: true

Once enabled, you can debug networking directly on the node using `ipvsadm`. This is real sysadmin work, not just YAML shuffling:

# Install the tool first
apt-get install ipvsadm

# List the virtual services (ClusterIPs) and their backends (Pod IPs)
ipvsadm -Ln

You will see a table mapping your Service VIPs directly to Pod IPs. If a Pod IP is missing here, no amount of ingress controller restarting will fix it.

The Physical Reality: Why Hardware Matters

You can tune sysctls until you are blue in the face, but if the underlying hypervisor is overcommitting resources, your `p99` latency will suffer. In Norway, we have excellent connectivity via NIX (Norwegian Internet Exchange), but that low latency is wasted if your VPS is waiting for CPU time to process a packet interrupt.

This is where the "noisy neighbor" effect kills Kubernetes. K8s is chatty. Etcd requires low disk latency (fsync). The API server requires CPU.

Benchmarking Disk Latency for Etcd

Before deploying a cluster on any VPS, I run `fio` to simulate etcd's write pattern. Etcd is the brain of your cluster; if it waits, everyone waits.

fio --rw=write --ioengine=sync --fdatasync=1 --directory=test-data --size=22m --bs=2300 --name=mytest

On standard spinning rust or shared SSDs, you might see 10ms+ latency. On CoolVDS NVMe instances, we consistently see sub-millisecond write latencies. This keeps the Kubernetes control plane stable, preventing those phantom "NodeNotReady" flaps that wake you up at 3 AM.

Data Sovereignty and Datatilsynet

We are operating in 2019, post-GDPR implementation. If you are hosting data for Norwegian users—health data, financial records, or even just emails—location matters. The Datatilsynet (Norwegian Data Protection Authority) is becoming increasingly vigilant.

Hosting on US-controlled clouds adds legal complexity regarding the CLOUD Act. Hosting on CoolVDS, which utilizes infrastructure directly in Oslo and Europe, simplifies your compliance posture. Your bits stay within the jurisdiction. It is not just about speed; it is about sovereignty.

Debugging Network Policies

If you use Calico, you likely use NetworkPolicies to isolate namespaces. A common issue: you deny all traffic but forget to allow DNS resolution. Suddenly, your app can't find the database.

Here is a safe default policy that denies ingress by default but allows DNS lookups from CoreDNS:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: backend
spec:
  podSelector: {}
  policyTypes:
  - Ingress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns-access
  namespace: backend
spec:
  podSelector:
    matchLabels: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: kube-system
    ports:
    - protocol: UDP
      port: 53

Conclusion

Kubernetes networking is not magic. It is a complex stack of routing tables, iptables chains (or IPVS hashes), and encapsulation protocols. To master it, you must look below the YAML.

You need a clean network path, minimal jitter, and storage that can keep up with Etcd. Don't build a Ferrari engine and put it inside a chassis with rusted wheels. We built CoolVDS to provide the raw, isolated performance that complex orchestrators like Kubernetes demand.

Ready to stop debugging network latency? Deploy a high-performance NVMe KVM instance on CoolVDS today and give your packets the highway they deserve.