Console Login

Docker Networking is Broken: A Deep Dive into Google's New Kubernetes Model

Docker Networking is Broken: A Deep Dive into Google's New Kubernetes Model

Let’s be honest with ourselves. We all love Docker. Since version 1.0 dropped in June, it has fundamentally changed how we package applications. But if you have tried to run containers across more than one host, you know the ugly truth: Docker networking is a mess.

We are currently stuck in "Port Hell." Managing thousands of random high-numbered ports just to link a web frontend to a database backend across two different servers is not scalable. It requires complex service discovery, fragile --link flags, and a lot of prayer.

This is why Google's recent open-sourcing of Kubernetes (Project Seven of Nine) is the most interesting thing to happen to infrastructure this year. They are proposing a radical shift: a flat networking model where every container (or "Pod") gets its own IP address. No more NAT. No more port mapping.

But how does this actually work under the hood in a standard Linux environment? Today, we are going to ignore the hype and look at the raw iptables and routing tables.

The "IP-per-Pod" Promise vs. Reality

In the standard Docker model we use today, the container gets an IP on a private bridge (usually docker0), which is not routable from the outside. To expose a service, you punch a hole in the host's firewall:

$ docker run -d -p 8080:80 nginx

This maps port 8080 on your host to port 80 in the container. If you have ten web servers, you need ten different host ports. It's a management nightmare.

Kubernetes demands something different: Every Pod must be able to communicate with every other Pod on any other node without NAT.

This sounds great, but Linux doesn't do this out of the box. To achieve this today, on August 6, 2014, we have to get our hands dirty with software-defined networking (SDN) or manual routing tables.

The Mechanics: Linux Bridges and veth Pairs

Whether you use raw Docker or this new Kubernetes tool, the plumbing is the same. It relies on veth (virtual ethernet) pairs. One end sits in the container namespace, and the other sits on the host, attached to a bridge.

If you are debugging a "hanging" container connection on your CoolVDS instance, start by inspecting the bridge. Don't just restart the daemon.

# Check the bridge status
$ brctl show
bridge name     bridge id               STP enabled     interfaces
docker0         8000.56847afe9799       no              veth2143
                                                        veth9912

If you see STP (Spanning Tree Protocol) enabled on a bridge meant for high-speed container traffic, turn it off. It adds unnecessary latency.

Routing Traffic Between Nodes

Here is the hard part. If Pod A (10.244.1.2) on Server 1 wants to talk to Pod B (10.244.2.2) on Server 2, Server 1 needs to know where to send that packet. By default, it will send it to the default gateway (Internet), which will drop it because 10.x.x.x is private.

In a Kubernetes setup, you effectively assign a subnet to each minion (node).

  • Node 1: 10.244.1.0/24
  • Node 2: 10.244.2.0/24

You then need to update the routing table on Node 1 to tell it where Node 2's pods live. On a standard Linux VPS, it looks like this:

# On Node 1 (192.168.1.10), add a route to Node 2 (192.168.1.11)
$ ip route add 10.244.2.0/24 via 192.168.1.11 dev eth0

This is the "pure" way to do it. However, managing these static routes becomes impossible once you pass three or four nodes. This is why we are starting to see tools like Flannel (from the CoreOS team) gain traction. They use an overlay network to encapsulate these packets in UDP.

The Performance Trap: Why Infrastructure Matters

This is where things go wrong for most people. Encapsulation (like VXLAN or UDP) adds overhead. Every packet has to be wrapped, sent, unwrapped, and delivered. This requires CPU cycles.

Pro Tip: Never try to run Docker or Kubernetes networking on OpenVZ or LXC-based VPS hosting. You cannot modify the kernel modules required for overlay networks or advanced iptables routing. You will hit a wall.

At CoolVDS, we exclusively use KVM (Kernel-based Virtual Machine). This gives you a real kernel. You can load modules like bridge, br_netfilter, and vxlan without asking support for permission. More importantly, when you are doing software routing, CPU interrupts matter.

Latency and the Norwegian Context

If you are building a distributed system serving customers in Oslo or Bergen, the physical location of your nodes is critical. Kubernetes relies on etcd for state management. Etcd is extremely sensitive to disk write latency and network latency.

If your etcd cluster is spread across cheap VPS providers with slow spinning disks and congested networks, your entire cluster will fall apart due to leader election timeouts.

We've benchmarked this extensively. Using standard SSDs (which we provide by default) vs. spinning rust makes the difference between a stable cluster and one that "flaps" every time you deploy.

Example: Manual iptables NAT for Outbound Traffic

Even with internal routing solved, your Pods need to reach the internet (e.g., to apt-get update). Since the Pod IPs are private, you need a Masquerade rule on the host. Kubernetes attempts to manage this, but if it fails, you need to know how to fix it manually:

# The command that saves the day when Pods can't ping 8.8.8.8
$ iptables -t nat -A POSTROUTING -s 10.244.0.0/16 ! -d 10.244.0.0/16 -j MASQUERADE

This rule tells the kernel: "If traffic is coming from the Pod subnet and going outside the cluster, replace the source IP with the Host IP."

Data Sovereignty: The Elephant in the Room

While we geek out over packets, we cannot ignore the legal reality here in Norway. The Personopplysningsloven (Personal Data Act) and Datatilsynet are very clear about where data lives. When you build these complex mesh networks, you must ensure you aren't accidentally routing traffic through a node in a non-compliant jurisdiction.

Hosting your Kubernetes (or Docker) cluster on CoolVDS ensures that your data remains physically in our datacenter. We peer directly at NIX (Norwegian Internet Exchange), meaning traffic between your Norwegian users and your servers often never leaves the country. That is a massive win for both latency and compliance.

Conclusion

Kubernetes is still in early development (v0.x). It is not ready for your main production database yet. However, the networking model it introduces—IP-per-Pod—is the future. It removes the fragility of port mapping.

If you want to experiment with this future, you need a sandbox that doesn't limit you. You need full kernel control, fast I/O for etcd, and low latency.

Don't let legacy virtualization hold your education back. Spin up a KVM instance on CoolVDS today and start breaking things.