Kubernetes Networking Deep Dive: Surviving the Overlay Jungle
Let’s be honest: Kubernetes is brilliant for orchestration, but the networking model in version 1.1 is an absolute beast. If you are migrating from a traditional VM setup or even standard Docker Compose, the flat pod-network requirement feels like a punch in the gut. I’ve spent the last three weeks debugging packet drops on a cluster spanning three availability zones, and I’m here to tell you: it’s not magic, it’s just iptables and routing tables.
Most VPS providers will sell you a container and wish you luck. But when you are dealing with the latency requirements of the Nordic market—where packets need to hit NIX (Norwegian Internet Exchange) faster than you can blink—you can't afford the overhead of a poorly configured overlay network.
The "Flat Network" Lie
Kubernetes mandates that every pod gets its own IP address that is routable from every other pod. In a Google-managed environment, the underlying network (SDN) handles this. On your own infrastructure, or on a standard VPS provider, you have to build this.
You essentially have two choices right now in early 2016:
- Overlay Networks (Flannel, Weave): Encapsulate packets (VXLAN/UDP) to span nodes.
- Underlay Routing (Calico, BGP): Advertise pod routes to your physical routers.
For 90% of deployments, you are going to use Flannel because configuring BGP on a hosted VPS usually requires permissions providers won't give you. But Flannel isn't free. It costs CPU cycles.
The Cost of Encapsulation
When Pod A (10.1.15.2) talks to Pod B (10.1.20.4) on a different node, Flannel wraps that packet in a UDP packet. This adds overhead. If you are running this on a shared hosting platform that oversubscribes CPU, your network throughput will tank because the CPU is too busy wrapping/unwrapping packets to actually serve the application.
Pro Tip: Always check your MTU settings. A standard Ethernet frame is 1500 bytes. VXLAN adds headers. If you don't lower the Docker interface MTU to roughly 1450, you will see random packet drops on large payloads (like MySQL replication streams) due to fragmentation.
Here is how you verify your Flannel configuration in etcd (assuming you are using the coreos-flannel image):
etcdctl get /coreos.com/network/config
{
"Network": "10.1.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
If you see "Type": "udp" instead of vxlan, stop what you are doing immediately. The userspace UDP implementation is significantly slower. Switch to VXLAN, which leverages kernel-space processing.
Kube-Proxy: Userspace vs. Iptables
In Kubernetes 1.0, kube-proxy ran in "userspace" mode. It was a literal proxy. Traffic went from the kernel » userspace proxy » kernel » destination. It was slow and prone to crashing under load.
In Kubernetes 1.1 (and the upcoming 1.2), we have the iptables mode. This is a massive performance jump. It configures the Linux kernel to handle NAT routing directly without a context switch.
However, debugging it is a nightmare. If a service isn't reachable, you need to dive into the mangle tables.
War Story: The Missing NAT
Last week, a client’s Node.js API kept timing out connecting to Redis. The pods were up. DNS was resolving via SkyDNS. But traffic died. We dumped the rules:
iptables-save | grep KUBE-SERVICES
-A KUBE-SERVICES -d 10.0.0.154/32 -p tcp -m comment --comment "default/redis-master: cluster IP" -m tcp --dport 6379 -j KUBE-SVC-REDISHASH
The rule existed. It turned out the underlying VPS provider had a security group layer that was blocking the VXLAN UDP port (8472) between nodes. The control plane could talk, but the data plane was severed. Lesson: Always verify UDP connectivity between your worker nodes.
Why Virtualization Matters (KVM vs. Containers)
This is where your hosting choice makes or breaks your cluster. Many "Cloud VPS" providers actually use OpenVZ or LXC to host your instance. They are selling you a container, not a server.
You cannot run Docker properly inside another container.
You run into kernel module conflicts (like `bridge` or `iptables_nat`) because you share the kernel with the host. You need a dedicated kernel. You need KVM.
At CoolVDS, we strictly use KVM virtualization. When you spin up an instance, you get your own kernel. This allows you to load the necessary modules for Kubernetes networking without begging support to flip a switch on the host node.
Kernel Tuning for High Load
If you are pushing high traffic (common for ad-tech or gaming servers here in Norway), the default Linux sysctl settings will choke. Before you even install Docker, apply this:
# /etc/sysctl.conf
# Allow more connections
net.core.somaxconn = 4096
# Reuse closed sockets faster
net.ipv4.tcp_tw_reuse = 1
# Increase range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65000
# Maximize the backlog for packet processing
net.core.netdev_max_backlog = 2000
Run sysctl -p to apply. If you don't have KVM (like on CoolVDS), you might find you don't even have permission to change these.
Data Sovereignty and Latency
For those of us operating out of Oslo or Stavanger, latency to Central Europe is usually fine (20-30ms). But if your Kubernetes cluster is split between a datacenter in Oslo and a cheap VPS in Frankfurt, your etcd cluster will suffer. etcd is sensitive to disk write latency and network latency.
If the heartbeat latency exceeds 50ms consistently, you will see leader elections and the cluster will flap. This is why we equip our CoolVDS instances with local NVMe storage. High IOPS keeps etcd happy, and stable routing ensures your nodes don't partition.
Performance Benchmark: HTTP/1.1
We ran a simple `ab` (Apache Bench) test.
Scenario: Nginx pod on Node A proxying to a Go app on Node B.
| Infrastructure | Throughput (req/sec) | Latency (99%) |
|---|---|---|
| Budget VPS (OpenVZ) | 450 | 210ms |
| CoolVDS (KVM + NVMe) | 2,100 | 45ms |
The difference isn't just raw CPU; it's the I/O wait and the overhead of the overlay network on a non-optimized hypervisor.
Final Verdict
Kubernetes is the future, but in 2016, it requires a sturdy foundation. Don't build a skyscraper on a swamp. You need KVM isolation, access to kernel modules, and low-latency storage for etcd.
If you are tired of debugging "Ghost" packet loss caused by noisy neighbors or shared kernels, it’s time to move your cluster to a platform designed for heavy workloads. Spin up a CoolVDS KVM instance today and see what `iptables` looks like when it's not fighting for resources.