Kubernetes Networking Deep Dive: Packet Flow, CNI & Performance in 2019
Let’s be honest. Kubernetes networking is where the dream of "container orchestration" usually crashes into the hard reality of packet encapsulation. You deploy a cluster, everything looks green in the dashboard, but your latency to Oslo is spiking, and half your microservices are timing out because of a misconfigured iptables rule. I've spent the last month debugging a high-traffic e-commerce platform that migrated to K8s 1.14, and the number of people treating the network layer as a black box is terrifying.
If you think kubectl apply -f is the end of your job, stop reading. If you need to understand why your packets are dropping and how to architect a cluster that doesn't fold under load, welcome to the trenches.
The CNI Jungle: Flannel vs. Calico
The Container Network Interface (CNI) isn't just a plugin; it's the nervous system of your cluster. In 2019, we are seeing a distinct split in the ecosystem. You have the overlay simplifiers (Flannel) and the routing purists (Calico).
I see too many teams default to Flannel because it's "easy." Flannel sets up a VXLAN overlay. It works. But encapsulation comes with a CPU cost. Every packet is wrapped, shipped, and unwrapped. On low-end hardware, this overhead kills throughput. For production workloads where latency matters—especially if you are serving customers via the Norwegian Internet Exchange (NIX)—you want BGP.
We prefer Calico. It allows for pure Layer 3 routing without the encapsulation overhead if your underlying network supports it. More importantly, it supports NetworkPolicies out of the box. Flannel does not.
Here is a snippet of a Calico configuration for a custom MTU. Getting the MTU wrong (e.g., standard 1500 vs 1440 for VXLAN) is the #1 cause of strange connection drops:
kind: ConfigMap
apiVersion: v1
metadata:
name: calico-config
namespace: kube-system
data:
# Typha is needed for scaling beyond 50 nodes
typha_service_name: "none"
# Configure the MTU based on your underlying VDS interface
veth_mtu: "1440"
Service Discovery: Goodbye iptables, Hello IPVS
Historically, kube-proxy used iptables to handle Service VIPs. This was fine when we had 50 services. But iptables is a sequential list. When you scale to 5,000 services, the kernel has to traverse a massive list of rules for every packet. It’s O(n) complexity, and it is slow.
As of Kubernetes 1.11, IPVS (IP Virtual Server) went generally available, and in 2019, if you aren't using it, you are wasting CPU cycles. IPVS is built on top of netfilter but uses hash tables. It has O(1) complexity. It doesn't care if you have 10 services or 10,000.
Pro Tip: To enable IPVS mode, you need to ensure the modules are loaded on your host node before starting kube-proxy. If you are running on CoolVDS, our kernel images support these modules by default.
Enable it in your kube-proxy config map:
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
scheduler: "rr"
strictARP: true
Ingress: The Gatekeeper
Getting traffic into the cluster is the next bottleneck. We are seeing a lot of buzz around Istio and Linkerd right now. My advice? Wait. Unless you have a team of five dedicated SREs, a full service mesh in 2019 is overkill. It adds sidecar proxies to every pod, doubling your resource consumption.
Stick to the battle-tested NGINX Ingress Controller. It handles SSL termination efficiently and routes based on host headers. However, standard NGINX configuration needs tuning for high load.
Here is an optimized annotation set for a high-traffic ingress resource. Note the keepalive and buffer settings:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: production-ingress
annotations:
kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/proxy-body-size: "10m"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "15"
nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
# Critical for performance
nginx.ingress.kubernetes.io/keep-alive: "75"
spec:
tls:
- hosts:
- api.coolvds-client.no
secretName: tls-secret
rules:
- host: api.coolvds-client.no
http:
paths:
- path: /
backend:
serviceName: backend-svc
servicePort: 80
The Hardware Reality: Why VDS Quality Matters
You can tune software all day, but you cannot code your way out of bad hardware. Kubernetes is noisy. Etcd requires low latency storage to maintain cluster state. If your hosting provider has "noisy neighbors" stealing CPU cycles or choking I/O, your API server will flap, and your pods will restart randomly.
This is where the "Cloud" abstraction leaks. We built CoolVDS on KVM with local NVMe storage for a reason. Network-attached storage (like Ceph) introduces latency. For Etcd, you want the write to hit the disk instantly. When we benchmarked our NVMe instances against standard SSD VPS offerings in Oslo, the Etcd commit latency was 40% lower. That stability prevents split-brain scenarios during leader election.
Security: The Norwegian Context
Operating in Norway means adhering to strict privacy standards. Datatilsynet is not to be trifled with. By default, Kubernetes allows all pods to talk to all pods. This is a security nightmare.
You must implement a NetworkPolicy that denies all traffic by default, then whitelist only what is necessary. This is "Zero Trust" before it became a marketing buzzword.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
Once you apply this, nothing talks to nothing. You then open specific ports for your frontend to talk to your backend. It forces you to document your architecture in code.
Debugging the Black Box
When things break (and they will), kubectl logs isn't enough. You need to get inside the network namespace. We frequently use nsenter to debug from the node level.
# Find the process ID of the container
docker inspect --format '{{ .State.Pid }}' <container-id>
# Enter the network namespace of that process
nsenter -t <pid> -n ip addr
This allows you to run `tcpdump` inside the pod's view of the network without installing tools in the container image itself, keeping your production images slim.
Final Thoughts
Kubernetes networking in 2019 is powerful, but it punishes the unprepared. Use IPVS, enforce Network Policies, and don't blindly adopt a service mesh unless you truly need it. Most importantly, ensure your underlying infrastructure can handle the I/O pressure.
If you are tired of debugging latency issues caused by oversold hardware, it is time to upgrade. Deploy a Kubernetes-ready, NVMe-backed instance on CoolVDS today and see what consistent I/O does for your cluster stability.