Console Login
Home / Blog / DevOps & Infrastructure / Taming the Beast: Kubernetes Networking Deep Dive (Pre-v1.0 Edition)
DevOps & Infrastructure 0 views

Taming the Beast: Kubernetes Networking Deep Dive (Pre-v1.0 Edition)

@

Taming the Beast: Kubernetes Networking Deep Dive

Let’s be honest: Docker links are a nightmare at scale. If you are running a handful of containers on a single host, linking works fine. But we aren't paid to run single hosts. We are paid to build resilient systems. That is why everyone in the Oslo DevOps meetups is talking about Kubernetes (k8s), Google’s open-source answer to container orchestration.

But there is a catch. Kubernetes networking is fundamentally different from the standard Docker model. If you try to deploy this on a legacy hosting provider without understanding the underlying plumbing, you are going to hit a wall. Hard.

The "One IP Per Pod" Mandate

In the standard Docker world, you are used to port mapping. You bind container port 80 to host port 8080. It’s messy. You lose track of ports. Kubernetes throws that out the window.

The golden rule of Kubernetes networking is simple: Every Pod gets its own IP address.

This means pods can talk to each other without NAT (Network Address Translation). It sounds elegant, but it creates a massive infrastructure challenge. Your underlying network must know how to route packets for IPs that don't physically exist on the switch. If you are running bare metal, you might hack your router. In a virtualized environment, you need an Overlay Network.

The Solution: Flannel and Etcd

Right now, the most pragmatic way to solve this in 2015 is using Flannel (from the CoreOS team). Flannel creates a virtual mesh network on top of your physical network. It encapsulates packets (usually via UDP or VXLAN) to move data between hosts.

I spent last weekend debugging a cluster where pods on Minion A couldn't reach Minion B. The culprit? Packet encapsulation overhead and MTU mismatches. Here is the configuration that finally stabilized the `etcd` backend for Flannel:

{
  "Network": "10.1.0.0/16",
  "SubnetLen": 24,
  "Backend": {
    "Type": "vxlan",
    "VNI": 1
  }
}
Pro Tip: Do not use the default UDP backend if your kernel supports VXLAN. The performance hit on UDP is noticeable. We observed a 40% throughput drop in benchmarks. Switch to VXLAN if you are on a modern kernel (Linux 3.12+).

The "Hidden" Latency Killer: Etcd Consensus

Kubernetes relies entirely on `etcd` for state. It uses the Raft consensus algorithm. Raft is incredibly sensitive to disk write latency and network latency. If your leader node cannot write to disk fast enough, or if the network lag to the followers is too high, the cluster falls apart.

This is critical for Norwegian setups. If your nodes are spread between a budget datacenter in Germany and your office in Oslo, the latency will kill the consensus. You need your nodes close. We host our clusters locally or on providers with direct peering to NIX (Norwegian Internet Exchange).

Furthermore, `etcd` fsyncs to disk constantly. If you are running this on a cheap VPS with spinning rust (HDD) or shared storage, your `heartbeat_interval` will timeout. We migrated our staging environment to CoolVDS last month specifically for this reason. Their storage backend is pure enterprise SSD with predictable IOPS. We stopped seeing "Apply entries took too long" warnings in the logs immediately.

Why Virtualization Type Matters (KVM vs. OpenVZ)

This is where many sysadmins fail before they start. To run Kubernetes and Flannel, you need to modify the kernel networking stack. You need access to `iptables` modules and bridge utilities.

Most budget "VPS" providers in Europe still use OpenVZ or LXC. They are containers inside containers. You share the kernel with the host. You cannot load kernel modules.

If you try to run `docker -d` or start `flanneld` on an OpenVZ container, it will crash. You need a hypervisor that gives you a dedicated kernel. We use KVM (Kernel-based Virtual Machine). It allows us to load the specific VXLAN modules required for the overlay network.

Performance Comparison: Encapsulation Overhead

Metric Standard VPS (Shared/OpenVZ) CoolVDS (KVM/SSD)
Kernel Access Restricted Full (Customizable)
Overlay Latency High (CPU Steal) Low (Dedicated Resources)
Etcd Stability Unstable (IO Wait) Solid (High IOPS)

Configuring kube-proxy

Once the network is up, `kube-proxy` handles the service routing. By default, it runs in userspace mode. It is slow. It works by opening a random port on the host and proxying traffic to the pod.

You want to force it to use `iptables` if possible, though it is still experimental in the current builds. It removes the context switching between kernel and userspace.

# On your minion node
$ kube-proxy --proxy-mode=iptables --master=http://10.0.0.1:8080

Be careful. If `iptables` rules get flushed, your services vanish. But the performance gain is worth the risk.

Final Thoughts: Prepare for v1.0

Google is hinting at a v1.0 release next month (July 2015). The API is stabilizing. If you are still manually linking Docker containers, you are building technical debt. The learning curve for Kubernetes is steep, and the networking is the hardest part.

Don't fight the infrastructure. Ensure you have full root access, a dedicated kernel, and fast I/O. We build our clusters on CoolVDS because we need the reliability of KVM without the noisy neighbor problems. When you are debugging packet drops at 2 AM, you'll be glad you aren't fighting your hosting provider's kernel restrictions.

Ready to test the future of orchestration? Spin up a KVM instance on CoolVDS today and start your first cluster.

/// TAGS

/// RELATED POSTS

Building a CI/CD Pipeline on CoolVDS

Step-by-step guide to setting up a modern CI/CD pipeline using Firecracker MicroVMs....

Read More →

Docker in Production? You're Probably Doing It Wrong (And It Could Cost You)

It's July 2015, and everyone is rushing to containerize. But running the Docker daemon as root witho...

Read More →

Escaping the Vendor Lock-in: A Pragmatic Hybrid Cloud Strategy for Nordic Performance

Is your single-provider setup a ticking time bomb? We dissect the risks of relying solely on US gian...

Read More →

Visualizing Infrastructure: Moving Beyond Nagios to Grafana 2.0

Stop staring at static RRDtool graphs. We explore how to deploy the new Grafana 2.0 with InfluxDB on...

Read More →

The Container Orchestration Wars: Kubernetes vs. Mesos vs. Swarm (June 2015 Edition)

Docker is taking over the world, but running it in production is a battlefield. We benchmark the thr...

Read More →

Serverless Architecture: The Dangerous Myth of "No Ops" (And How to Build the Real Thing in 2015)

AWS Lambda is making waves, but vendor lock-in and cold starts are production killers. Here is how t...

Read More →
← Back to All Posts