Kubernetes Networking: Solving the Docker Port Conflict Nightmare
If you have tried running Docker in production this year, you have likely hit the wall. I am talking about the --link flag hell. You start a database, you link the web server, and everything works fine on your laptop. Then you deploy to a cluster, and suddenly you are juggling port mappings like a circus act. Port 80 is taken. Port 8080 is taken. So you end up mapping internal port 3306 to external port 49153. Good luck debugging that at 3 AM when the site goes down.
We have been testing Google's new project, Kubernetes (currently roughly v0.5), here in our Oslo labs. It promises to fix this mess. Unlike the standard Docker model, which relies heavily on NAT (Network Address Translation) and port forwarding, Kubernetes insists on a flat network space. Every container (or "Pod") gets its own IP address. No NAT. No port conflicts.
This sounds like magic. It isn't. It is just clever Linux routing. But making it work across multiple hosts today, in late 2014, requires getting your hands dirty with overlay networks. Let's look at how to actually build this.
The Problem: Docker's Default Bridge
By default, Docker creates a docker0 bridge. It assigns an IP (usually 172.17.x.x) to containers. This is fine for a single node. But if you have two servers, Docker will use the same subnet on both. Container A on Server 1 cannot talk to Container B on Server 2 because they might share the same IP, or the servers simply don't know how to route the packets.
Pro Tip: Stop using /etc/hosts hacks to manage service discovery. It is not 1999. If you are serious about clustering, you need a distributed key-value store like etcd.The Solution: The Overlay Network (Flannel)
To achieve the Kubernetes requirement (IP-per-Pod), we need an overlay network. We are seeing a lot of success with CoreOS's rudimentary but effective tool, flannel (formerly rudder). Flannel creates a virtual network interface that encapsulates packets and sends them over the wire to the other host.
Here is how we configure it. This setup assumes you are running a KVM-based VPS (like CoolVDS) because you need kernel-level support for TUN/TAP devices. Do not try this on a cheap OpenVZ container; it will fail.
1. Configure the Network in etcd
First, we tell etcd about our desired network overlay range. We are choosing a large /16 subnet to give us plenty of room.
curl -L http://127.0.0.1:4001/v2/keys/coreos.com/network/config -XPUT -d value='{ "Network": "10.1.0.0/16" }'2. Start the Flannel Daemon
Next, we run the flanneld binary. It reads the config from etcd and allocates a smaller subnet (a /24) to the specific host it is running on. It creates a flannel0 interface.
# Starting flanneld manually for debug purposes
sudo ./flanneld -etcd-endpoints=http://127.0.0.1:4001 -iface=eth0Once running, check your interfaces. You should see the new tunnel device:
$ ip addr show flannel0
4: flannel0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1472 qdisc pfifo_fast state UNKNOWN group default qlen 500
link/none
inet 10.1.15.0/16 scope global flannel0
valid_lft forever preferred_lft foreverNote the MTU of 1472. This is critical. Because Flannel wraps the packet in a UDP header (VXLAN or generic UDP), we lose some bytes. If your applications force a 1500 MTU, packets will drop. You must configure Docker to respect this lower limit.
3. Reconfigure Docker
This is where most people fail. You must tell the Docker daemon to use the subnet Flannel gave you. Flannel writes these variables to a subnet file (usually /run/flannel/subnet.env). We need to inject them into Docker's startup config.
On Ubuntu 14.04, edit /etc/default/docker:
DOCKER_OPTS="--bip=10.1.15.1/24 --mtu=1472"Restart Docker. Now, every container you launch gets an IP in the 10.1.15.x range. Repeat this on Server 2, and Flannel will assign it 10.1.16.x. The routing table on Server 1 now knows that anything for 10.1.16.x goes through the flannel0 tunnel.
Why Infrastructure Choices Make or Break This
You might be thinking, "This adds latency." You are correct. Encapsulation requires CPU cycles to wrap and unwrap every single packet. In our benchmarks targeting the Norwegian internet exchange (NIX), we saw a 3-5% overhead using VXLAN backends.
This is why the "race to the bottom" in VPS pricing is dangerous for DevOps. If you run this on a host with "noisy neighbors" or shared CPU cores, that overhead spikes. The packet processing gets delayed while the CPU is busy rendering someone else's PHP site.
At CoolVDS, we use KVM virtualization exclusively. This allows us to pass through CPU instructions more efficiently. Furthermore, for database-heavy workloads inside Kubernetes, IOPS (Input/Output Operations Per Second) become the bottleneck before the network does. We have started rolling out PCIe SSD storage in our Oslo zone specifically to handle the random I/O patterns generated by distributed etcd clusters and Docker registries.
Security Implications
With a flat network, any container can technically talk to any other container across your cluster. This is great for connectivity, bad for security. Since Kubernetes v0.5 is still developing its policy primitives, you rely on iptables.
Do not expose your etcd port (4001) to the public internet. If you do, anyone can rewrite your network config. Bind it to your private interface or use the CoolVDS private network VLAN feature if you are running a multi-node setup.
The Future is Flat
Kubernetes is still in alpha. It is rough. The documentation is sparse. But this networking model—where a Pod is a first-class citizen with its own IP—is going to replace the fragile linking systems we use today. It aligns better with how we have managed physical networks for decades.
If you are ready to stop fighting port conflicts and start building scalable clusters, you need three things: a modern Linux kernel (3.10+), a solid overlay config, and a VPS provider that doesn't choke on packet encapsulation. Don't let IO wait times kill your API performance.
Ready to test the future? Deploy a KVM instance on CoolVDS today and get your first cluster running in under 60 seconds.