Console Login

Kubernetes Networking Deep Dive: Why Port Mapping is Dead | CoolVDS

Kubernetes Networking Deep Dive: Why Port Mapping is Dead

If you have spent the last six months wrestling with docker run --link and maintaining a spreadsheet of which ports map to what on your host machines, you are not alone. Containerization is transforming how we deploy in Norway, from small startups in Oslo to enterprise backends, but the networking story has been—frankly—a nightmare. NAT (Network Address Translation) everywhere. Port conflicts. The inability to easily communicate across multiple hosts without complex GRE tunnels.

Enter Kubernetes. Google's new open-source container orchestrator is still in its infancy (v0.4 as of this writing), but it proposes a radical shift that solves the biggest pain point of Docker 1.3: Networking. In this deep dive, we are going to look under the hood of the Kubernetes networking model, configure a Flannel overlay network, and discuss why your choice of underlying hardware (specifically KVM vs. OpenVZ) effectively dictates whether you can run this stack at all.

The Problem: The "Host-Port" Trap

In the standard Docker model we use today, every container gets an IP address on a private bridge (usually docker0). This IP is not routable from outside the host. To expose a service, you map a port on the host to the container:

$ docker run -d -p 8080:80 my-web-app

This works for a single server. But what happens when you scale to a cluster? You have to track which ports are used on which nodes. You need a complex service discovery mechanism to tell your load balancer that "Service A" is on Node 1:8080 and Node 2:9090. It is fragile, manual, and error-prone.

The Kubernetes Model: A Flat Address Space

Kubernetes (or K8s) takes a different approach. It mandates a flat network space where:

  1. Every Pod (a group of containers) gets its own unique IP address.
  2. All Pods can communicate with all other Pods without NAT.
  3. Agents on a node (like the Kubelet) can communicate with all Pods on that node.

This means no more port mapping. If your PHP-FPM pod listens on port 9000, other pods connect to it on 10.1.x.x:9000 directly. It feels like a return to the simplicity of VMs, but with the speed of containers.

Implementing the Flat Network with CoreOS Flannel

Since standard Layer 2 switches do not know how to route these private Pod IPs across the internet (or across a public cloud provider), we need an Overlay Network. Currently, the most promising solution is Flannel (from the CoreOS team).

Flannel runs a small binary agent, flanneld, on each host. It allocates a subnet to each host and encapsulates traffic in UDP packets (using VXLAN or simple UDP) to transport it between hosts.

Here is how you configure it. First, we populate etcd with our desired network configuration:

$ etcdctl mk /coreos.com/network/config '{"Network":"10.100.0.0/16"}'

Next, we start flanneld on our node. It reads from etcd and writes a subnet file. We then need to reconfigure Docker to use this subnet instead of its default bridge.

Configuration for /etc/systemd/system/docker.service (CoreOS style):

[Service]
EnvironmentFile=/run/flannel/subnet.env
ExecStart=/usr/bin/docker -d -s overlay --bip=${FLANNEL_SUBNET} --mtu=${FLANNEL_MTU}

Note the --bip flag. This forces Docker to assign IPs from the range Flannel allocated to this specific host. Now, when a container sends a packet to an IP on another host, flanneld catches it, wraps it, and shoots it across the data center.

Pro Tip: Check your MTU settings. The encapsulation overhead (usually 50 bytes for VXLAN) means your container interfaces must have a smaller MTU than the physical interface (e.g., 1450 vs 1500). If you miss this, you will see strange packet drops on large HTTP responses.

Service Discovery: The Role of Kube-Proxy

Getting packets from Pod A to Pod B is step one. But how does the frontend find the backend when IPs change dynamically? Kubernetes solves this with Services and the kube-proxy.

Currently, in v0.4, kube-proxy runs in userspace mode. It opens a random port on the host and uses iptables to capture traffic destined for a "Service VIP" (Virtual IP). It essentially acts as a localized load balancer on every single node.

$ iptables -t nat -L KUBE-PORTALS
Chain KUBE-PORTALS (1 references)
target     prot opt source               destination
REDIRECT   tcp  --  0.0.0.0/0            10.0.0.1             tcp dpt:80 redir ports 44321

While this userspace round-tripping adds some latency (milliseconds matter!), it provides a stable IP address for your applications to talk to.

The Hardware Reality: Why Virtualization Matters

This is where many DevOps engineers hit a wall. To run Flannel, Docker bridges, and custom iptables rules, you need kernel-level access. You need the ability to load modules like vxlan or manipulate the bridge interface.

This is impossible on most OpenVZ or LXC-based VPS providers.

Cheap VPS hosting often uses "OS-level virtualization" (containers) where you share a kernel with the host. They will not let you create a TUN/TAP device or modify the bridge network because it affects other customers.

To run Kubernetes networking properly, you need a KVM (Kernel-based Virtual Machine) solution like CoolVDS. With KVM, you have your own dedicated kernel. You can:

  • Load the overlay module for Docker storage drivers.
  • Create virtual network interfaces for Flannel.
  • Tune sysctl parameters for high-load networking.

Latency and the "Norwegian Factor"

Overlay networks add overhead. Encapsulating and decapsulating every packet takes CPU cycles. If you are running this on over-committed hardware, your network throughput will tank.

At CoolVDS, we use high-performance CPUs and Pure SSD storage to minimize this impact. Furthermore, for Norwegian businesses, running your cluster in our Oslo data center ensures you are directly connected to the NIX (Norwegian Internet Exchange). Low latency is critical for distributed systems like etcd, which relies on fast consensus for cluster state. A 50ms lag to a server in Frankfurt can cause your Kubernetes master to think a node has failed, triggering unnecessary rescheduling storms.

Conclusion: Prepare for the Future

Kubernetes is still bleeding edge. The documentation changes weekly, and setup scripts break often. However, the networking model it introduces is the correct one for distributed systems. It decouples the application from the underlying infrastructure.

If you are ready to stop mapping ports and start building true clusters, you need the right foundation.

Ready to test your own Kubernetes cluster? Deploy a high-performance KVM instance on CoolVDS today and get full root access to build the network architecture of tomorrow.