Kubernetes Networking Deep Dive: CNI, Ingress, and the Overlay Tax
Let’s be honest: Kubernetes (K8s) makes deploying applications feel like magic, but it makes networking feel like a black box of misery. If you are running kubectl apply and hoping for the best, you are already failing. I’ve seen production clusters in Oslo go dark not because the application crashed, but because the overlay network choked on packet encapsulation overhead during a traffic spike.
It is August 2018. Kubernetes 1.11 is stable. We aren't hacking together Mesos anymore. But the complexity of the networking model—Pod-to-Pod, Service-to-Pod, and Ingress—remains the biggest hurdle for teams moving from monolithic VPS setups to distributed architectures. In this post, we are going to rip open the hood of the Container Network Interface (CNI), look at iptables mess that kube-proxy creates, and discuss why your underlying infrastructure (the actual metal or VDS) matters more than your manifest files.
The CNI Jungle: Flannel vs. Calico
Kubernetes doesn't provide a native network implementation. It defines an interface (CNI) and expects a plugin to do the heavy lifting. Your choice here defines your cluster's latency and complexity.
In the Nordic hosting market, where we prioritize low latency (getting under 10ms from Oslo to Trondheim), the "Overlay Tax" is real. This is the CPU cost of encapsulating a packet inside another packet (VXLAN or IP-in-IP) to cross the node boundary.
| Feature | Flannel (VXLAN) | Calico (BGP) |
|---|---|---|
| Mechanism | L2 Overlay (Encapsulation) | L3 Routing (BGP) |
| Complexity | Low (It just works) | High (Requires network knowledge) |
| Performance | Medium (Encap overhead) | High (Near native speeds) |
| Network Policies | No | Yes |
If you are deploying on CoolVDS, we often recommend Calico if you are comfortable with BGP. Why? Because our underlying KVM architecture allows for highly efficient Layer 3 routing. However, for 90% of setups, Flannel is the default. Here is how you typically deploy Flannel on a fresh 1.11 cluster:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml
But here is the catch: Flannel defaults to VXLAN. Every single packet between pods on different nodes is wrapped. If your VPS provider oversubscribes CPUs, that encapsulation process stalls. This is why we insist on dedicated CPU slices at CoolVDS—because "CPU Steal Time" kills network throughput before it kills your app.
The kube-proxy and iptables Nightmare
Until IPVS becomes the default standard (it's GA in 1.11 but complex to configure), we are stuck with iptables mode. kube-proxy watches the API server and writes thousands of iptables rules to NAT traffic to the correct Pod IP.
When you have 5,000 services, the kernel has to traverse a sequential list of rules. It is O(n). This adds latency. I recently debugged a cluster where service discovery added 40ms of latency just processing rules.
To check if your node is drowning in rules:
# Check the number of NAT rules
sudo iptables -t nat -L | wc -l
# If this number is over 20,000, you are going to feel the pain on a standard VPS.
Pro Tip: If you are running high-traffic workloads, ensure your nf_conntrack table is sized correctly. The default on many Ubuntu 18.04 distributions is too low for a busy K8s node.
Kernel Tuning for Nordic Performance
You cannot run a production Kubernetes cluster on default Linux settings. Whether you are hosting GDPR-compliant data for a Norwegian bank or a high-traffic media site, you need to tune the sysctl parameters. Here is the configuration we apply to our high-performance reference architecture:
# /etc/sysctl.d/k8s-tuning.conf
# Increase the connection tracking table size
net.netfilter.nf_conntrack_max = 131072
# Reduce TIME_WAIT to release sockets faster
net.ipv4.tcp_fin_timeout = 15
# Allow IP forwarding (Required for CNI)
net.ipv4.ip_forward = 1
# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535
# Maximize the backlog for high-speed switching
net.core.netdev_max_backlog = 5000
Apply this with sysctl -p /etc/sysctl.d/k8s-tuning.conf. If you are on a provider that restricts kernel tuning (like some OpenVZ hosts), you simply cannot run Kubernetes reliably. CoolVDS uses KVM, so you have full kernel control.
Ingress: Exposing the Cluster
Exposing services via NodePort is amateur hour. You need an Ingress Controller. In 2018, the NGINX Ingress Controller is the undisputed king. It handles SSL termination, load balancing, and path-based routing.
However, SSL termination is CPU heavy. If you are hosting inside Norway to comply with Datatilsynet requirements, you want that handshake to happen as fast as possible.
Here is a robust Ingress definition that handles TLS and sets specific timeouts to avoid hanging connections during NIX (Norwegian Internet Exchange) congestion periods:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: production-ingress
annotations:
kubernetes.io/ingress.class: "nginx"
# Increase buffer size for large headers (common in enterprise apps)
nginx.ingress.kubernetes.io/proxy-buffer-size: "16k"
# Enable client certificate authentication if needed for internal tools
# nginx.ingress.kubernetes.io/auth-tls-verify-client: "on"
# Optimization for high concurrency
nginx.ingress.kubernetes.io/worker-processes: "4"
spec:
tls:
- hosts:
- api.coolvds-demo.no
secretName: coolvds-tls
rules:
- host: api.coolvds-demo.no
http:
paths:
- path: /
backend:
serviceName: backend-service
servicePort: 80
The Storage Bottleneck: etcd
Networking isn't just about packets; it's about the control plane. Kubernetes relies on etcd to store state. etcd requires incredibly low latency for disk writes (fsync). If your disk latency spikes, etcd leader election fails, and your cluster updates freeze. This manifests as networking timeouts because the API server cannot update endpoints.
This is where the hardware reality hits. Spinning rust (HDD) or shared SATA SSDs often fail the etcd performance benchmark. We built CoolVDS on NVMe specifically because we saw DevOps teams struggling with "flaky" clusters that were actually just suffering from I/O wait.
Summary
Kubernetes networking is a stack of trade-offs. You trade CPU cycles for the convenience of overlay networks (Flannel). You trade kernel complexity for service discovery (iptables).
To win this game in 2018, you need two things:
- Knowledge: Understand the packet flow. Don't blindly apply manifests.
- Infrastructure: Use KVM-based virtualization with high packet-per-second capabilities and NVMe storage.
If you are building the next great Norwegian SaaS, don't let a cheap VPS bottleneck your overlay network. Test your network throughput today. Deploy a CoolVDS instance, install `iperf3`, and see the difference raw performance makes.