Console Login

Kubernetes Networking on Bare Metal: Surviving the Packet Jungle (2020 Edition)

Kubernetes Networking on Bare Metal: Surviving the Packet Jungle

If I had a Norwegian Krone for every time a developer blamed the firewall when the actual culprit was a misconfigured CNI plugin or an MTU mismatch, I could retire to a cabin in Svalbard. Kubernetes networking is deceptively simple on a whiteboard: "Services talk to Pods." In reality, it is a labyrinth of IP tables, encapsulation headers, and latency-killing context switches.

We are late in 2020. K8s 1.19 is stable. Yet, I still see production clusters running on default settings that choke under load. I recently spent 48 hours debugging a Magento cluster that would randomly drop connections. The logs were clean. The hardware was fine. The issue? A clash between the overlay network's VXLAN headers and the underlying VPS MTU settings.

Let's cut the marketing fluff. You don't need a "service mesh" if your foundational networking is broken. This is how you build a K8s network stack that actually handles traffic, specifically within the context of European data sovereignty and Norwegian infrastructure.

1. The CNI Battlefield: Flannel vs. Calico vs. Cilium

The Container Network Interface (CNI) is where your packets live or die. In 2020, you have three real choices for a self-hosted or VPS-based cluster.

CNI Plugin Mechanism Use Case Overhead
Flannel VXLAN (usually) Simple, small clusters. High (Encapsulation)
Calico BGP / IPIP Enterprise, Policy enforcement. Medium/Low
Cilium eBPF High performance, observability. Lowest

For most deployments on CoolVDS, we recommend Calico using BGP mode if possible, or IPIP if you must. Why? Because VXLAN (Flannel's default) adds a packet header that eats into your MTU and burns CPU cycles for encapsulation/decapsulation. On a high-traffic VPS, those cycles add up.

Configuring Calico for Performance

Don't just apply the default manifest. You need to align your IP pools. If you are running on a provider that supports BGP peering (like some of our advanced setups), disable IPIP entirely to get near-native speeds.

apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  name: default-ipv4-ippool
spec:
  cidr: 192.168.0.0/16
  ipipMode: CrossSubnet # Only encapsulate across subnets, not within the same L2 domain
  natOutgoing: true

Using CrossSubnet ensures that pod-to-pod traffic on the same node or same L2 segment doesn't get encapsulated. This drastically reduces latency.

2. The `kube-proxy` Dilemma: IPTables vs. IPVS

By default, Kubernetes uses `iptables` to route service traffic. This works fine for 50 services. It works terribly for 5,000 services. `iptables` is a sequential list; the kernel has to traverse the rules one by one. It is O(n).

Switch your cluster to IPVS (IP Virtual Server) mode. It uses hash tables—O(1) complexity. It is faster, more stable, and supports better load balancing algorithms (like Least Connection).

Pro Tip: Before enabling IPVS, ensure the kernel modules are loaded on your CoolVDS instance. We include these in our standard images, but verify them.
# Check for modules
lsmod | grep -e ip_vs -e nf_conntrack_ipv4

# Enable in kube-proxy config map
kubectl edit configmap kube-proxy -n kube-system

Change the mode line:

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs" # Change from "iptables"

3. Ingress: The Doorway to Hell (and how to fix it)

You are likely using NGINX Ingress Controller. It's the standard for a reason. But out of the box, it is not tuned for high throughput. If you are serving traffic to Oslo users, you want low latency. If you leave the `proxy-buffer-size` at default, header-heavy applications (like those using heavy cookies or JWTs) will error out with 502s.

Here is a production-grade annotation set for your Ingress resources to handle high concurrency:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: production-ingress
  annotations:
    kubernetes.io/ingress.class: "nginx"
    # Keep connections alive to the backend
    nginx.ingress.kubernetes.io/upstream-keepalive-connections: "32"
    # Increase buffer for large headers (common in enterprise apps)
    nginx.ingress.kubernetes.io/proxy-buffer-size: "16k"
    # Client body size (don't let default 1m kill your uploads)
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
spec:
  rules:
  - host: api.coolvds-client.no
    http:
      paths:
      - backend:
          serviceName: backend-service
          servicePort: 80

4. The Hardware Reality: Why Virtualization Matters

Software-defined networking is great, but it cannot fix slow I/O. When a packet hits your server, it causes an interrupt. If your VPS host is stealing CPU cycles (noisy neighbors) or your disk I/O is saturated, your network latency spikes. It doesn't matter how optimized your Cilium config is if the hypervisor is blocked.

This is where the choice of hosting provider becomes a technical decision, not just a financial one. At CoolVDS, we use KVM (Kernel-based Virtual Machine). Unlike container-based virtualization (like OpenVZ), KVM provides true isolation. We couple this with NVMe storage.

Why does NVMe matter for networking? etcd. Kubernetes relies entirely on etcd for state. Etcd is extremely sensitive to disk write latency. If your etcd is slow, your API server lags, your CNI updates delay, and your network convergence time increases. Low latency NVMe storage is not a luxury for K8s; it is a requirement.

5. Debugging Like a Pro: `nsenter`

When networking breaks, you need to see what the Pod sees. Don't install `curl` or `tcpdump` inside your production Docker images—that creates security bloat. Instead, use `nsenter` on the node to jump into the Pod's network namespace.

First, find the Docker Container ID on the node:

docker ps | grep my-pod-name

Get the PID:

docker inspect --format '{{ .State.Pid }}' <container-id>

Enter the network namespace:

nsenter -t <pid> -n ip addr show

Now you can run `tcpdump` from the host context but capture traffic as if you were inside the pod. This is invaluable for debugging service-to-service communication failures without altering the running container.

6. The Schrems II Factor: Data Sovereignty

We cannot ignore the elephant in the room. In July 2020, the CJEU struck down Privacy Shield (Schrems II). If you are hosting personal data of Norwegian citizens, relying on US-owned cloud providers has become a legal minefield. The transfer mechanisms are under heavy scrutiny by Datatilsynet.

Running Kubernetes on CoolVDS keeps your data physically in Norway or Europe, on European-owned infrastructure. This simplifies your GDPR compliance significantly. You control the network stack. You control the storage. You know where the bits live.

Final Thoughts

Kubernetes networking is brittle if you treat it as a black box. Understand the path of the packet. Choose a CNI that fits your scale (Calico or Cilium). Use IPVS. And ensure your underlying infrastructure—CPU, RAM, and Disk—is dedicated and fast.

Don't let I/O wait times kill your cluster's performance. Deploy a KVM-based, NVMe-powered instance on CoolVDS today and see what stable latency actually looks like.