Console Login

Kubernetes 1.1 Networking: Demystifying Pod Communication and Overlays on Bare Metal

Kubernetes 1.1 Networking: Demystifying Pod Communication and Overlays on Bare Metal

Let’s be honest: getting Kubernetes up and running is the easy part. The tutorials usually end right after kubectl get nodes returns Ready. But when you deploy your first multi-pod application and services refuse to route traffic, you realize the networking model is where the actual battle is fought.

As we close out 2015, Kubernetes 1.1 is stabilizing, but networking remains the "dark art" of container orchestration. Unlike the traditional VM model where you have a static IP and a known gateway, Kubernetes demands a flat network space where every pod can talk to every other pod without NAT. Achieving this on bare metal or VPS environments—without the magic of GKE—requires getting your hands dirty with overlays and routing tables.

I’ve spent the last month migrating a client's monolith to microservices, and I’m going to walk you through the networking primitives, the new iptables proxy mode, and why the underlying hardware (and location) matters more than you think.

The "Flat Network" Mandate

The fundamental rule of Kubernetes networking is simple but strict: containers on different nodes must communicate without Network Address Translation (NAT).

If you are running Docker 1.9 on a single machine, the docker0 bridge handles this fine. But spanning multiple hosts? That's where things break. You cannot have Node A assigning 172.17.0.2 to a pod and Node B assigning the same IP to another. You need a cluster-wide CIDR block.

In a managed cloud, the provider handles this. On a VPS setup (which you should be using for cost control and data sovereignty), you need an overlay network. Right now, Flannel (by CoreOS) is the pragmatic choice. It uses etcd to store the network configuration and distributes subnet allocations to each host.

Configuring Flannel for Performance

Flannel encapsulates packets. This introduces overhead. By default, it often uses UDP, which is fine for testing but terrible for production throughput. You should be looking at the vxlan backend if your kernel supports it (Linux 3.12+), which most modern Ubuntu 14.04 or CentOS 7 images do.

Here is a standard configuration pattern we push to etcd before starting the flannel daemon:

# Populate etcd with the network configuration
curl -L http://127.0.0.1:2379/v2/keys/coreos.com/network/config -XPUT -d value='{
    "Network": "10.244.0.0/16",
    "SubnetLen": 24,
    "Backend": {
        "Type": "vxlan",
        "VNI": 1
    }
}'

Once flanneld starts, it reads this, reserves a /24 subnet for the local node (e.g., 10.244.15.0/24), and writes the subnet config to a file that Docker consumes.

Pro Tip: Always check your MTU. The encapsulation header adds bytes. If your host interface (eth0) is 1500, your flannel interface needs to be 1450. If you don't set this, you will see random packet drops on large payloads, and you will lose days debugging it.

The Shift: Userspace vs. Iptables Proxy

This is the biggest change in Kubernetes 1.1. In version 1.0, kube-proxy ran in "userspace" mode.

  1. Packet hits the node interface.
  2. Kernel passes it up to kube-proxy process (userspace).
  3. kube-proxy proxies it to the backend pod.

This context switching is a performance killer. It adds latency. In 1.1, the default is now the iptables mode. The proxy watches the API server and writes iptables rules to handle the redirection entirely in the kernel.

To verify which mode you are running, check your nat table:

iptables -S -t nat | grep KUBE-SERVICES

If you see a massive chain of rules, congratulations, you are running in the optimized mode. It looks scary to debug, but it is orders of magnitude faster.

Debugging Service Discovery

When things go wrong, it's usually DNS (SkyDNS) or the iptables rules. If a service IP isn't reachable, check if the endpoints exist:

kubectl get endpoints --all-namespaces

If the endpoints are empty, your selector matches nothing. If endpoints exist but connection times out, dump the rules for that specific service IP:

# Assume Service IP is 10.0.0.154
iptables-save | grep "10.0.0.154"

The Hardware Impact: Latency and I/O

Overlay networks require CPU cycles to encapsulate and decapsulate packets. If you are running on a "noisy neighbor" VPS where the host CPU is overcommitted, your network throughput will fluctuate wildly. This is why we argue against budget shared hosting for container clusters.

Furthermore, etcd is the brain of your cluster. It is extremely sensitive to disk write latency. If etcd writes take too long, the cluster can become unstable or split-brain.

This is where infrastructure choice becomes architectural. At CoolVDS, we don't oversell resources. Our KVM instances run on local NVMe storage. In 2015, most providers are still on SATA SSDs or even spinning rust for their VPS nodes. Moving etcd to NVMe storage dropped our leader election timeouts to near zero.

The Norwegian Data Context

We cannot ignore the legal landscape. In October, the European Court of Justice invalidated the Safe Harbor agreement (Schrems I). If you are a Norwegian company dumping customer data into US-owned clouds (even if the region is "EU"), you are now operating in a legal grey zone regarding the Personopplysningsloven.

Running your Kubernetes cluster on CoolVDS guarantees your data stays in Oslo. We connect directly to NIX (Norwegian Internet Exchange). This provides two benefits:

  1. Compliance: Data never leaves Norwegian jurisdiction.
  2. Latency: If your users are in Norway, your round-trip time (RTT) is single-digit milliseconds.

Setting Up a Robust Node

If you are deploying a worker node today on Ubuntu 14.04, here is the baseline configuration script to ensure Docker 1.9 and Kubelet play nice with the CoolVDS network stack:

# Install Docker 1.9
curl -sSL https://get.docker.com/ | sh

# Configure Docker to use the Flannel subnet
source /run/flannel/subnet.env
echo "DOCKER_OPTS='--bip=${FLANNEL_SUBNET} --mtu=${FLANNEL_MTU}'" >> /etc/default/docker
service docker restart

# Enable IP forwarding (Crucial!)
sysctl -w net.ipv4.ip_forward=1

# Check for br_netfilter module
modprobe br_netfilter
echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables

Without enabling bridge-nf-call-iptables, your Kubernetes Services will not function correctly because bridged traffic won't traverse the iptables rules created by kube-proxy.

Conclusion

Kubernetes is changing how we deploy, but it doesn't remove the need to understand Linux networking. It actually demands you know more about it. The move to iptables in v1.1 is a huge step forward for performance, but it adds complexity to debugging.

Don't let slow overlay networks or disk I/O bottlenecks kill your cluster's performance. For a production-grade environment that respects Norwegian data privacy and delivers the NVMe performance etcd craves, you need the right foundation.

Ready to build? Spin up a high-performance KVM instance in Oslo on CoolVDS today and get full root access to build your network your way.