Kubernetes Networking Deep Dive: Flannel, Calico, and IPTables in Production
Docker networking is deceptively simple. You run a container, you map a port, and you go home. But if you are reading this in August 2016, you know that moving from a single Docker host to a Kubernetes cluster changes everything. Suddenly, --link is dead, NAT is the enemy, and you are staring at iptables rules that look like the Matrix code.
I have spent the last three months debugging packet drops on a cluster spanning three availability zones, and I can tell you this: the Kubernetes network model is elegant, but the implementation details will eat your CPU cycles if you aren't careful. With the release of Kubernetes 1.3 last month, we are finally seeing some stability with the CNI (Container Network Interface) standard, but the confusion remains.
This isn't a "Hello World" tutorial. This is a deep dive into how packets actually move between Pods, why your overlay network might be adding 30% latency, and how to fix it on bare-metal or high-performance KVM instances like those we run at CoolVDS.
The Fundamental Rule: IP-per-Pod
Forget everything you learned about dynamic port mapping. Kubernetes dictates a flat network space. Every Pod gets its own IP address. That IP must be routable from any other Pod on any other node without NAT (Network Address Translation).
This sounds great until you realize your underlying network—whether it's a public cloud or a physical rack in Oslo—doesn't know anything about these Pod IPs. They are virtual. To bridge this gap, we rely on Overlay Networks. And this is where performance usually goes to die.
The Overlay Wars: Flannel vs. Calico
Right now, in 2016, you essentially have two main choices for networking your cluster if you aren't on GCE: Flannel and Calico.
Flannel is the easy button. It uses a database (etcd) to store network configurations and wraps your packets in UDP packets (VXLAN) to transport them across nodes. It works, but that encapsulation has a cost. Every packet has to be encapsulated and decapsulated. On a standard VPS with noisy neighbors, this jitter is fatal for real-time apps.
Calico takes a different approach. It uses BGP (Border Gateway Protocol)—the same protocol that runs the Internet—to route packets between nodes. No encapsulation. Just pure routing. It is faster, but harder to set up.
Here is a typical flannel configuration for a CoreOS or Ubuntu cluster using the new CNI plugin architecture:
{
"name": "cbr0",
"type": "flannel",
"delegate": {
"isDefaultGateway": true
}
}
Simple? Yes. Efficient? Not always. If you are running high-throughput databases like MongoDB or Cassandra on Kubernetes, the VXLAN overhead of Flannel can reduce your I/O throughput significantly. We have seen benchmarks where raw throughput drops by 20-30% compared to host networking.
Pro Tip: If you are running on CoolVDS, our KVM virtualization supports virtio-net drivers that handle packet processing significantly better than standard emulated NICs. However, always check your MTU. If your physical interface MTU is 1500, your Flannel MTU must be lower (e.g., 1450) to account for the VXLAN header. Mismatched MTUs are the #1 cause of "it connects but hangs" issues.
Service Discovery: The Move to IPTables Proxy
Before Kubernetes 1.2, the kube-proxy component ran in "userspace" mode. It was a literal proxy. A packet hit a port, the process woke up, read the packet, and sent it to a backend Pod. It was slow and brutal on context switching.
As of Kubernetes 1.2 and now 1.3, the default is iptables mode. This is a game-changer for performance. The kube-proxy simply watches the API server and programs Linux iptables rules to handle the routing inside the kernel. No process context switching is required for the packet to be forwarded.
You can verify which mode you are running by checking the logs of your kube-proxy, or by dumping the nat table on a node:
sudo iptables-save -t nat | grep KUBE-SERVICES
# You should see chains like this:
# :KUBE-SERVICES - [0:0]
# :KUBE-SVC-XYZ - [0:0]
# -A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
If you see a massive list of rules, that's good. That's the kernel doing the work. However, this introduces a new complexity: O(n) performance. If you have 5,000 services, updating these rules takes time. But for most of us running mid-sized clusters in 2016, it is vastly superior to the old userspace proxy.
The "Ingress" Resource: Exposing to the World
We used to expose services via NodePort or LoadBalancer. But opening a port on every node (NodePort) is messy, and LoadBalancers are expensive or unavailable on bare metal. Enter the Ingress resource (currently in Beta).
An Ingress allows you to define routing rules (Layer 7) inside Kubernetes. You run an Ingress Controller (usually Nginx) which listens for changes and reloads its configuration dynamically. This allows you to host multiple domains on a single external IP.
Here is a working Ingress definition for a project we deployed last week:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: production-ingress
annotations:
ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: api.coolvds-demo.no
http:
paths:
- path: /v1
backend:
serviceName: backend-service
servicePort: 8080
This is cleaner than managing HAProxy config files manually with Puppet or Chef. But remember: the Ingress Controller itself is just a Pod. If that Pod gets starved of CPU, your entire cluster's external traffic chokes.
Infrastructure Matters: Latency and Location
Kubernetes is software, but it relies on hardware. When you have microservices talking to each other across nodes, network latency stacks up. If Service A calls Service B, which calls Service C, and you have 2ms latency between nodes, that's added overhead before you even process a request.
This is where local geography and virtualization tech collide. In Norway, data sovereignty is becoming a massive topic following the invalidation of Safe Harbor. Moving workloads to US-based clouds adds legal headaches and, more importantly, latency (usually 100ms+ to the US East Coast).
Hosting in Norway, closer to the NIX (Norwegian Internet Exchange), keeps that latency negligible. Furthermore, the type of VPS matters.
Comparison: Shared vs. Dedicated Resources
| Feature | Standard Container VPS | CoolVDS KVM Instance |
|---|---|---|
| Kernel | Shared | Dedicated |
| Network Stack | Filtered/Shared | Full Access (Tap/Tun) |
| Overlay Performance | Poor (Double Encapsulation) | High (Hardware Passthrough) |
We built CoolVDS on KVM specifically to solve the "noisy neighbor" problem. When you are running a software-defined network (SDN) like Flannel inside a VPS, you need the hypervisor to get out of the way. OpenVZ or LXC containers often block the kernel modules required for advanced networking or BGP routing. KVM gives you a real kernel, allowing you to load `ip_vs` modules or tune TCP buffers exactly how Kubernetes likes them.
Final Thoughts: Don't Ignore the Physical Layer
Kubernetes 1.3 is a powerful beast. It abstracts away a lot of pain, but it introduces network complexity that requires a solid foundation. If you are seeing random timeouts, check your MTU settings on the overlay. If you are seeing high CPU usage on the master, check your kube-proxy mode.
And if you want to run a cluster that serves Norwegian users with single-digit millisecond latency, you need infrastructure that physically sits in the region and respects your I/O needs. Don't let a slow network layer ruin your architecture.
Ready to build your cluster? Spin up a KVM instance on CoolVDS today. We offer native NVMe storage and unmetered internal traffic, perfect for chatty microservices. Deploy your first node in 55 seconds.