Console Login

Kubernetes 1.4 Networking: Surviving the Packet Jungle on Bare Metal & KVM

Kubernetes 1.4 Networking: Surviving the Packet Jungle on Bare Metal & KVM

Let’s be honest. Kubernetes is fantastic for orchestration, but the networking model is a nightmare if you don't know what you're doing. I spent the last weekend debugging a packet loss issue that turned out to be a misconfigured MTU size inside a Flannel overlay. If you think you can just run kube-up.sh and walk away, you’re going to get a pager alert at 3 AM.

In this post, we are cutting through the hype. No buzzwords, just packets. We are looking at how Kubernetes 1.4 handles networking, the trade-offs between CNI plugins, and why your underlying VPS provider (like CoolVDS) matters more than you think when you start wrapping packets in packets.

The "Flat Network" Lie

Kubernetes assumes that every pod can talk to every other pod without Network Address Translation (NAT). It sounds elegant. In practice, achieving this on traditional infrastructure requires a lot of plumbing. Since Docker's default networking model creates a bridge per host (docker0) that is isolated, we need an abstraction layer.

This is where CNI (Container Network Interface) comes in. In late 2016, you are essentially choosing between two architectures:

  1. Encapsulation (Overlay): Wrapping IP packets in UDP packets (VXLAN/UDP). Examples: Flannel, Weave.
  2. Routing (Underlay): Using BGP to route pod IPs natively. Example: Calico.

The Overhead of Overlays

If you are deploying on CoolVDS KVM instances, you have full control over the kernel, which is great. But if you choose Flannel with VXLAN, you are adding CPU overhead for encapsulation and decapsulation on every single packet.

Pro Tip: If you are using Flannel, check your backend configuration. The default `udp` backend is slow because it runs in userspace. Always switch to `vxlan` or `host-gw` if your provider allows Layer 2 connectivity between nodes.

Here is a typical Flannel configuration ConfigMap you might be applying with kubectl:


{
  "Network": "10.244.0.0/16",
  "Backend": {
    "Type": "vxlan",
    "VNI": 1
  }
}

If you see high "sy" (system) usage in top, your overlay is likely eating your CPU cycles.

Service Discovery & The IPTables Spaghetti

So, your pods have IPs. Great. But pods die. Services are the abstraction meant to solve this. In Kubernetes 1.4, kube-proxy defaults to iptables mode (the userspace proxy is practically dead, thankfully).

When you create a Service, kube-proxy writes a chain of rules to redirect traffic destined for the Service IP to one of the backing Pod IPs. It looks like this:


# Check the NAT table for Kubernetes services
sudo iptables -t nat -L KUBE-SERVICES -n | head -n 10

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination
KUBE-SVC-X7Q3...  tcp  --  0.0.0.0/0            10.0.0.233           /* default/my-nginx:http cluster IP */ tcp dpt:80

It works, but it's messy. If you have 5,000 services, iptables becomes a bottleneck because the rules are evaluated sequentially. This is O(n) complexity. While we are hearing rumors of IPVS support coming in future versions, right now in 2016, we have to live with this.

Latency Warning: Every packet meant for a Service hits these rules. If your underlying disk I/O or CPU is stalling (common on oversold hosting), the kernel takes longer to process these interrupts. This is why we stick to CoolVDS NVMe instances for our master nodes—waiting for I/O wait while updating iptables rules is a cluster-killer.

Ingress: Exposing to the World

You can't just use NodePort for everything unless you want to manage a massive load balancer config manually. The Ingress resource (still beta in 1.4) is the standard way to expose HTTP/HTTPS.

We typically run the nginx-ingress-controller. It watches the API server for Ingress rules and hot-reloads the Nginx configuration.


apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: cool-app-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: api.coolvds-demo.no
    http:
      paths:
      - path: /v1
        backend:
          serviceName: api-service
          servicePort: 80

The Norwegian Latency Factor

Here is where geography hurts you. If your Kubernetes cluster is hosted in a generic cloud region in Ireland or Frankfurt, but your users are in Oslo or Bergen, you are adding 20-40ms of latency unnecessarily.

Norwegian traffic should stay in Norway. Routing via NIX (Norwegian Internet Exchange) ensures your packets take the shortest path. When we deploy clusters for local clients, we ensure the nodes are physically located in Oslo. It minimizes the round-trip time (RTT), which is critical when you have the overhead of HTTPS handshakes + Kubernetes Ingress + Overlay Networking.

Hardening the Cluster

Security is not optional, especially with Datatilsynet watching. Kubernetes 1.4 introduced NetworkPolicy (beta). This is huge. By default, K8s is open—any pod can talk to any pod.

You need a CNI that supports it (like Calico or Weave). Flannel does not support Network Policies. This is a common trap. If you need isolation (e.g., preventing the frontend from talking to the database directly), you need Calico.


kind: NetworkPolicy
apiVersion: extensions/v1beta1
metadata:
  name: access-nginx
spec:
  podSelector:
    matchLabels:
      run: nginx
  ingress:
  - from:
    - podSelector:
        matchLabels:
          access: "true"

The Hardware Reality Check

Software-defined networking (SDN) is CPU intensive. I've seen "cloud" instances crumble under high packet rates (PPS) because the hypervisor was stealing CPU cycles (noisy neighbors).

When running Kubernetes on a VPS:

  1. Check Steal Time: Run top. If %st is above 0.5, migrate. Your networking will be jittery.
  2. VirtIO Drivers: Ensure your OS (Ubuntu 16.04 or CentOS 7) is using virtio_net.
  3. Disk Speed: Etcd is the brain of Kubernetes. It requires low latency sequential writes. If you put etcd on a spinning HDD or a shared network drive, your cluster will fail leader election.

This is where the choice of provider becomes a technical decision, not just a billing one. We utilize CoolVDS because the KVM isolation is strict, and the NVMe storage satisfies etcd's fsync requirements without sweating. Stability in the infrastructure layer means I spend less time debugging "phantom" network timeouts.

Final Thoughts

Kubernetes 1.4 is powerful, but it's not magic. It's just Linux networking primitives wrapped in Go code. Understand iptables, pick the right CNI plugin for your needs (Calico for policy, Flannel for simplicity), and host it on metal that doesn't steal your CPU cycles.

If you are ready to build a cluster that doesn't fall over when traffic spikes, stop playing with toy instances. Deploy a high-performance KVM VPS on CoolVDS and configure your networking the right way.