You Are Losing 30% of Your Throughput to a Proxy. Fix It.
If you are running Kubernetes 1.1 in production today, you are likely running it wrong. While the hype machine screams about "scheduling at scale" and "microservices," nobody is talking about the packet slaughter happening inside your kube-proxy. I have spent the last week debugging a high-traffic Magento cluster migrating from a US cloud to our infrastructure in Oslo, and the latency numbers were garbage. The culprit? The default networking configuration.
With the Schrems I ruling invalidating the Safe Harbor agreement last October, and Datatilsynet (The Norwegian Data Protection Authority) breathing down our necks, moving workloads to Norwegian soil is no longer optional—it's a legal survival strategy. But shifting from AWS to bare metal or high-performance KVM VPS means you can't rely on magical cloud load balancers. You have to understand the plumbing.
This is not a beginner's guide. This is how you configure Kubernetes networking for raw speed on CoolVDS hardware, specifically targeting the new iptables mode introduced in v1.1.
The Userspace Bottleneck
By default, kube-proxy runs in userspace mode. It acts as a literal proxy server. Every single packet destined for a Service IP has to context-switch from kernel space to user space, get handled by the proxy binary, and then get pushed back to kernel space to be routed to the Pod. It is reliable, but it is slow.
In a recent benchmark on a CoolVDS 4-Core instance, I saw CPU usage spike to 100% on the node just from network overhead during a load test. The context switching is a killer.
The Fix: Enable Iptables Mode
Kubernetes 1.1 introduced an experimental iptables proxy mode. Instead of shuttling packets back and forth, it simply writes massive lists of iptables NAT rules. The Linux kernel handles the routing entirely in kernel space. It is orders of magnitude faster.
You need to explicitly enable this in your kube-proxy configuration. If you are using hyperkube or a custom systemd unit, add this flag:
/usr/local/bin/kube-proxy \
--master=http://10.0.0.1:8080 \
--proxy-mode=iptables \
--masquerade-all \
--logtostderr=true \
--v=2
Warning: If a Pod fails in userspace mode, the proxy can retry another Pod. In iptables mode, if the packet is routed to a dead Pod (before the readiness probe catches it), the connection fails. This forces you to write better health checks. Good. You should be doing that anyway.
Choosing Your CNI: Flannel vs. Weave vs. Calico
Docker's default docker0 bridge is useless for multi-host networking because it assigns IPs blindly. You need an overlay (or underlay) that guarantees unique Pod IPs across the cluster. Here is the reality of the ecosystem in early 2016:
1. Flannel (The Pragmatic Choice)
Flannel from CoreOS is what I use for 90% of deployments. It is simple. It uses etcd to store subnet assignments. By default, it uses VXLAN encapsulation. The overhead is manageable, especially if your host has hardware offloading for VXLAN (which CoolVDS KVM instances support).
Configuration for etcd:
# Populate etcd with the network config
etcdctl set /coreos.com/network/config '{
"Network": "10.244.0.0/16",
"SubnetLen": 24,
"Backend": {
"Type": "vxlan"
}
}'
2. Weave Net (The Encryption Play)
If you are handling sensitive user data and are paranoid about the lack of Safe Harbor protection, Weave offers full mesh encryption. However, the performance hit is significant. Only use this if you have regulatory requirements that demand encryption in transit *inside* the private network.
3. Project Calico (The Performance King)
Calico is different. It doesn't use an overlay. It uses BGP (Border Gateway Protocol) to route packets between nodes. It is basically running the internet's routing protocol inside your cluster. It is complex to set up, but the performance is near-native. If you are pushing 10Gbps, use Calico. For standard web apps, Flannel is sufficient.
War Story: The "Noisy Neighbor" Latency Spike
Last month, we debugged a cluster where random HTTP requests were taking 2000ms. We traced it to CPU Steal. The client was hosting on a cheap "Container VPS" provider (OpenVZ). Because OpenVZ shares the kernel, another tenant's heavy networking load was locking the kernel's network stack.
We migrated them to CoolVDS, which uses KVM. With KVM, you have your own kernel. We tuned the sysctl.conf to optimize for high concurrency:
# /etc/sysctl.conf optimizations for high-load K8s nodes
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.neigh.default.gc_thresh1 = 1024
net.ipv4.neigh.default.gc_thresh2 = 2048
net.ipv4.neigh.default.gc_thresh3 = 4096
Latency dropped from 2000ms to 4ms. Hardware isolation matters.
Pro Tip: Always runconntrack -L | wc -lon your nodes during load testing. If you hit the table limit, packets get dropped silently. Increasenet.netfilter.nf_conntrack_maxif you see drops indmesg.
Local Context: Why Oslo Matters
Latency to the NIX (Norwegian Internet Exchange) is a competitive advantage. If your customer base is in Norway, hosting in Frankfurt adds 15-20ms round trip. Hosting in the US adds 100ms+ and legal liability. Using a CoolVDS instance in Oslo keeps your ping times single-digit and keeps Datatilsynet happy.
Conclusion
Kubernetes 1.1 is powerful, but its default networking settings are conservative. Switch to iptables mode immediately. Use Flannel with VXLAN for a balance of speed and sanity. And stop running production workloads on shared kernels.
If you need a KVM environment that respects your need for raw iptables access and dedicated I/O, deploy a test node on CoolVDS. It takes 55 seconds to spin up, which is less time than it takes to read the man page for iptables-save.