Console Login

Kubernetes 1.2 Networking Deep Dive: Surviving the Overlay Jungle on Bare Metal

Kubernetes 1.2 Networking Deep Dive: Surviving the Overlay Jungle on Bare Metal

Let’s be honest: getting Kubernetes to schedule containers is the easy part. The moment you need those containers to actually talk to each other across different nodes without massive latency spikes, the real headache begins. I spent the last 48 hours debugging a packet loss issue on a distributed cluster, and the culprit wasn't the code—it was a misunderstood VXLAN encapsulation.

With the recent release of Kubernetes 1.2, we have more tools than before, but the complexity has ramped up. If you are running on AWS, you might just click a button and get an ELB. But for those of us running on performant, cost-effective VPS infrastructure in Norway—where data sovereignty matters—we don't have those magic cloud buttons. We have to build the network plumbing ourselves.

This post dissects the Kubernetes networking model as it stands in mid-2016, comparing CNI plugins like Flannel and Calico, and explaining why your kube-proxy mode matters more than you think.

The "IP-Per-Pod" Promise vs. Reality

Kubernetes mandates a flat network structure. Every Pod gets its own IP address that is routable from any other Pod on any other node. No NAT (Network Address Translation) allowed between Pods. This sounds elegant until you have to implement it on a standard Linux VPS where you only have one public IP and a private interface.

To achieve this on CoolVDS KVM instances, we typically rely on an Overlay Network. This creates a virtual network on top of the physical network. But here is the catch: encapsulation kills performance if you aren't careful. Every packet gets wrapped in a UDP header, sent across the wire, and unwrapped. That CPU overhead adds up.

Choosing Your Weapon: Flannel vs. Calico

In 2016, the CNI (Container Network Interface) landscape is essentially a two-horse race for most bare-metal deployments.

1. Flannel (The Easy Way)

Flannel by CoreOS is the default for a reason. It uses etcd to store network configurations and allocates a subnet to each host. The backend is usually VXLAN. It's stable, but debugging flannel0 interfaces when your database latency spikes is painful.

2. Calico (The Performance Way)

Calico takes a different approach. Instead of an overlay, it uses BGP (Border Gateway Protocol) to route packets between hosts. No encapsulation means near-native network speeds. For high-throughput applications hosted on CoolVDS NVMe instances, Calico is often the superior choice, though it requires your underlying network to allow BGP peering—or you run it in 'IP-in-IP' mode, which brings us back to encapsulation.

Pro Tip: If you are running a database inside K8s (brave, but becoming common), stick to Calico or host networking. The overhead of VXLAN on Flannel can introduce a 20-30% drop in transaction throughput on heavy write loads.

The Critical Switch: Userspace vs. Iptables Proxy

This is the most important configuration change in Kubernetes 1.2. In previous versions, kube-proxy ran in "userspace" mode. It was a literal proxy: traffic went from kernel -> userspace (kube-proxy) -> kernel -> destination. It was slow and brittle.

As of 1.2, the iptables mode is the new standard. It manages traffic entirely in the kernel using Linux netfilter rules. It is drastically faster. However, it can fail silently if your kernel is too old. Ensure your CoolVDS instance is running a modern kernel (4.4+ is ideal on Ubuntu 16.04).

Here is how you verify which mode you are running. Check the logs of the kube-proxy container or service:

journalctl -u kube-proxy | grep "proxy mode"

If you see "userspace", you are leaving performance on the table. Force it by passing the flag:

--proxy-mode=iptables

Exposing Services: The Ingress Controller Revolution

On a VPS, we don't have an external LoadBalancer service that magically provisions an IP. We used to rely on NodePort, opening random high-numbered ports (30000-32767) on every node. It’s messy and a firewall nightmare.

The solution in 2016 is the Ingress Controller (currently in beta). You run a single Nginx pod that listens on port 80/443 and routes traffic based on host headers. This allows you to host multiple domains on a single CoolVDS IP address efficiently.

Here is a snippet of a working Ingress resource for a service targeting the Norwegian market:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: nordic-app-ingress
spec:
  rules:
  - host: app.coolvds-client.no
    http:
      paths:
      - path: /
        backend:
          serviceName: frontend-service
          servicePort: 80

To make this work, you need the Nginx Ingress Controller running. Deploying it as a DaemonSet ensures high availability:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: nginx-ingress-controller
  namespace: kube-system
spec:
  template:
    metadata:
      labels:
        name: nginx-ingress
    spec:
      containers:
      - image: gcr.io/google_containers/nginx-ingress-controller:0.8.3
        name: nginx-ingress
        ports:
        - containerPort: 80
          hostPort: 80
        - containerPort: 443
          hostPort: 443

Data Sovereignty and Latency in Norway

Why go through this trouble instead of using a managed cloud? Two reasons: Latency and Law.

Traffic routing from Oslo to a data center in Frankfurt or Ireland adds 20-40ms of latency. For real-time trading or gaming applications, that is unacceptable. CoolVDS infrastructure sits directly on the NIX (Norwegian Internet Exchange) backbone. The ping times are single-digit milliseconds.

Furthermore, with the uncertain legal landscape following the invalidation of Safe Harbor, and the new Privacy Shield framework still finding its footing, keeping customer data physically in Norway (under Datatilsynet jurisdiction) is the safest play for any CTO concerned about compliance.

Debugging Tips for the Trenches

When networking fails (and it will), standard tools like ping often lie to you because of how containers handle ICMP. Here is my checklist for 2016-era K8s debugging:

  1. Check the nodes: kubectl get nodes. If a node is 'NotReady', the SDN agent (flanneld) is likely dead.
  2. Check the iptables: There can be thousands of rules. Use iptables-save | grep to see if the rules actually exist.
  3. Tcpdump inside the container: You can't always do this easily. I use a sidecar container with network tools installed.
docker run -it --net=container: nicolaka/netshoot /bin/bash

Building a Kubernetes cluster on bare metal VPS requires a deeper understanding of Linux networking than using a managed service. But the reward is total control, lower costs, and guaranteed data residency. If you need a stable foundation for your cluster, CoolVDS provides the raw KVM performance and consistent I/O that etcd and overlay networks demand.

Don't let latency kill your project. Spin up a CoolVDS instance in Oslo today and build a network you actually control.