Console Login

Kubernetes Networking Deep Dive: Surviving the Packet Jungle in Production

Kubernetes Networking Deep Dive: Surviving the Packet Jungle in Production

Let's be honest: Kubernetes networking is where the dream of "container orchestration" often wakes up screaming. You deployed your microservices, everything looked green in the dashboard, but now your API latency is spiking, and kubectl logs isn't telling you why. The harsh reality of October 2019 is that while Kubernetes (k8s) has won the orchestration war, the networking layer remains a minefield of iptables rules, encapsulation overhead, and DNS timeouts.

I've spent the last month debugging a cluster that looked healthy but behaved like it was on a dial-up connection. The culprit wasn't the application code; it was a mismatch between the CNI plugin and the underlying VPS architecture. If you think you can just slap a default flannel install on any cloud instance and get sub-millisecond latency, you are mistaken.

The CNI Battlefield: Flannel vs. Calico

Your choice of Container Network Interface (CNI) defines your cluster's performance profile. In the current landscape, you largely have two camps: the "It Just Works" overlay networks (like Flannel) and the "I Need Raw Speed" Layer 3 solutions (like Calico).

Flannel uses VXLAN encapsulation. It wraps your packets in UDP packets to ship them across nodes. This adds CPU overhead and reduces the Maximum Transmission Unit (MTU), leading to fragmentation if you aren't careful. For a simple dev cluster, it's fine. For production? It's a bottleneck.

Calico, on the other hand, can use BGP (Border Gateway Protocol) to route packets without encapsulation if your network supports it. Even with IPIP encapsulation, it tends to outperform Flannel in throughput. In our benchmarks here at CoolVDS, moving from VXLAN to BGP routing on our KVM infrastructure reduced inter-pod latency by nearly 20%.

Here is a snippet to check your current Calico configuration and ensure MTU is correctly set (crucial when running on virtualized hardware):

kubectl get configmap -n kube-system calico-config -o yaml

Look for the veth_mtu setting. If your host interface (eth0) is 1500, your overlay MTU should ideally be 1440 or lower to account for headers.

Pro Tip: If you are hosting on CoolVDS, our internal network supports Jumbo Frames on specific private networking plans. This allows you to crank that MTU up, significantly boosting throughput for data-heavy workloads like Elasticsearch or Cassandra.

The kube-proxy Bottleneck: Ditch iptables for IPVS

By default, Kubernetes uses iptables to handle Service discovery and load balancing. When a Service is created, iptables rules are updated on every node. This is O(N). If you have 5,000 services, iptables becomes a massive list that the kernel has to traverse sequentially for every packet. It kills CPU performance.

As of Kubernetes 1.11, IPVS (IP Virtual Server) is GA. It uses hash tables instead of linear lists. It is O(1). If you are running a cluster in late 2019 and still using iptables mode, you are voluntarily slowing down your traffic.

To enable IPVS mode, you typically need to edit the kube-proxy config map. Here is how you do it:

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
  scheduler: "rr"
  strictARP: true

Before applying this, ensure the IPVS kernel modules are loaded on your worker nodes:

# Load required modules
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
modprobe nf_conntrack_ipv4

The Silent Killer: DNS Latency (ndots:5)

This is a classic "gotcha" that plagues almost every new setup. By default, the /etc/resolv.conf inside a Pod has options ndots:5. This means if you try to resolve google.com, the resolver will first try:

  1. google.com.namespace.svc.cluster.local
  2. google.com.svc.cluster.local
  3. google.com.cluster.local
  4. ...and so on.

Only after these fail does it resolve the actual domain. This multiplies the load on CoreDNS and adds latency to every external call. If your application makes heavy external API calls, this is disastrous.

The fix? Force a fully qualified domain name (FQDN) by adding a trailing dot in your application code (e.g., google.com.), or tune the dnsConfig in your Pod spec:

apiVersion: v1
kind: Pod
metadata:
  name: refined-dns-pod
spec:
  containers:
  - name: test
    image: nginx
  dnsConfig:
    options:
      - name: ndots
        value: "2"

Security: Locking Down the Mesh

We operate under strict GDPR mandates here in Europe. In Norway, Datatilsynet does not look kindly on leaky architectures. By default, all Pods in a K8s cluster can talk to all other Pods. If a hacker compromises your frontend, they can scan your database directly.

You must implement NetworkPolicies. However, this requires a CNI that enforces them (like Calico or Weave). Flannel does not support this out of the box. Here is a "Default Deny" policy that every production namespace should start with:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: backend-prod
spec:
  podSelector: {}
  policyTypes:
  - Ingress

Once applied, you must explicitly whitelist traffic. It’s painful to set up, but it’s the only way to be compliant and secure.

The Infrastructure Reality: Why CoolVDS Matters

You can tune sysctl parameters and optimize YAML manifests until 3 AM, but you cannot software-engineer your way out of bad hardware. Kubernetes is incredibly I/O sensitive, particularly etcd. If your hosting provider uses shared spinning rust (HDD) or oversells their CPU cores, your leader election will time out, and your cluster will flap.

At CoolVDS, we see this constantly with customers migrating from budget providers. They complain about "Kubernetes instability," but the logs show fsync latency on etcd exceeding 100ms. That’s disk wait.

The Hardware Comparison

Feature Budget VPS CoolVDS Architecture
Storage SATA SSD (Shared) Enterprise NVMe (Local RAID 10)
Virtualization OpenVZ / Container (Noisy) KVM (Kernel Isolation)
Network 100Mbps / 1Gbps Shared Low Latency (Direct NIX Peering)

For a Kubernetes node, especially the masters, NVMe is not a luxury; it is a requirement in 2019. We strictly use KVM virtualization to ensure that your kernel resources—specifically the network stack—are yours alone. No noisy neighbors stealing your packet processing cycles.

Local Latency: The Norway Advantage

If your users are in Oslo, Bergen, or Trondheim, physics is your enemy. Hosting your cluster in a massive datacenter in Frankfurt adds 20-30ms of round-trip time (RTT). That might not sound like much, but in a microservices architecture where one request fans out to ten internal calls, that latency compounds.

CoolVDS infrastructure is optimized for the Nordic region. By peering directly at NIX (Norwegian Internet Exchange), we keep traffic local. This isn't just about speed; it's about data sovereignty. Keeping data within Norwegian borders simplifies GDPR compliance significantly.

Final Thoughts

Kubernetes is powerful, but it doesn't absolve you of understanding networking fundamentals. Start with a robust CNI like Calico, enable IPVS mode, fix your DNS ndots, and lock down your namespaces.

And most importantly, build on a foundation that can handle the load. Don't let slow I/O kill your orchestration. Deploy a high-performance, NVMe-backed KVM instance on CoolVDS today and give your packets the highway they deserve.