Console Login

Kubernetes Networking Deep Dive: Packet Flow, CNI, and Performance Bottlenecks in Production

Kubernetes Networking in Production: Moving Beyond Localhost

Let’s be honest. Kubernetes is fantastic until you have to debug a networking issue. On your laptop, Minikube runs fine. The magic happens, packets flow, and developers are happy. Then you push to a multi-node cluster, and suddenly Pod A cannot talk to Service B, or your latency spikes to 200ms between microservices that sit on the same rack.

I have spent the last three weeks debugging a cluster for a client in Oslo. They migrated from a monolithic setup to microservices using Kubernetes 1.4. Their application latency doubled. Why? Because they treated the network as an abstraction. In 2016, you cannot afford to treat the network as an abstraction. You need to know exactly how that packet moves from the eth0 of the container, through the bridge, onto the host interface, across the wire, and into the destination.

This is a deep dive into the plumbing of Kubernetes networking. We are going to look at CNI plugins, the iptables mess generated by kube-proxy, and why the underlying hardware of your VPS provider matters more than your Docker config.

The Fundamental Model: A Flat Address Space

Kubernetes imposes a strict requirement: every pod gets its own IP address. Pods must be able to communicate with all other pods without Network Address Translation (NAT). This sounds simple, but implementation is messy.

To achieve this, we use the Container Network Interface (CNI). As of December 2016, your main choices are Flannel, Weave, or Calico. Most people default to Flannel because it's easy. But "easy" often means "slow" if you pick the wrong backend.

The Cost of Overlay Networks (VXLAN)

If you use Flannel with the VXLAN backend, you are encapsulating packets. An IP packet inside a UDP packet. This adds overhead. The CPU has to wrap and unwrap every single frame. On a dedicated server with massive power, you might not notice. On a cheap, oversold VPS with noisy neighbors, this is fatal.

Here is a typical Flannel configuration using VXLAN. If you see this in your /etc/cni/net.d/ or your ConfigMap, and your CPU steal is high, you found your bottleneck.

{
  "Network": "10.244.0.0/16",
  "Backend": {
    "Type": "vxlan"
  }
}
Pro Tip: If your nodes are on the same Layer 2 network (which implies a high-quality hosting provider that gives you real networking control), switch Flannel to host-gw mode. It simply adds static routes to the host's routing table. No encapsulation. Near-native speeds.

Service Discovery: The Kube-Proxy Implementation

Pods are ephemeral. They die. They get rescheduled. Their IPs change. To solve this, we use Services. But a Service is just a VIP (Virtual IP). It doesn't exist as a physical interface. It is a lie implemented via iptables (or userspace, but if you are still using userspace proxy in late 2016, please upgrade).

When kube-proxy runs in iptables mode, it watches the API server and writes rules to redirect traffic destined for the Service IP to one of the backend Pod IPs.

Let’s look at what this actually does to your kernel. Run iptables-save on a busy node and you will see the horror:

-A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-WNB7W6F5Z2K2QV4T
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -j KUBE-SEP-6J5K3O6Q5W6Z2K2Q

It uses the statistic module to randomly load balance traffic. As your cluster grows to thousands of services, these rulesets become massive. Sequential rule processing adds latency. This is O(n) complexity. For a standard e-commerce stack running Magento or a custom Java app, it's fine. For high-frequency trading or real-time VoIP, it is a problem.

The Ingress Layer

Exposing services via NodePort or LoadBalancer is messy and expensive. The modern (well, since Kubernetes 1.2) way is Ingress. This allows you to define routing rules in HTTP.

Currently, the NGINX Ingress Controller is the battle-tested standard. You define an Ingress resource, and the controller reloads the NGINX configuration inside a pod.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: cool-app-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: api.cool-app.no
    http:
      paths:
      - path: /v1
        backend:
          serviceName: app-service
          servicePort: 80

When you apply this, the controller translates it into an nginx.conf server block. If your underlying disk I/O is slow, NGINX reloads can lag, causing dropped connections during deployments.

Infrastructure Matters: The CoolVDS Reality

You can tune your sysctl.conf, you can optimize your iptables, and you can swap CNI plugins. But if the virtual machine underneath is garbage, your Kubernetes cluster will be garbage.

Kubernetes is noisy. Etcd requires low-latency disk writes (fsync) to maintain cluster quorum. If your provider uses spinning rust (HDD) or shared SATA SSDs, your Etcd cluster will flap, and your API server will time out. This is not a software bug; it is a hardware limitation.

This is where CoolVDS fits into the architecture. We don't overprovision. We use KVM (Kernel-based Virtual Machine) which gives you strict resource isolation. Unlike container-based virtualization (OpenVZ/LXC), KVM allows you to load your own kernel modules, which is often required for advanced CNI plugins like Calico or Weave.

Furthermore, for Norwegian businesses, data residency is becoming a legal minefield. With the privacy landscape shifting (Datatilsynet is watching), hosting your Kubernetes cluster on US-controlled clouds adds compliance overhead. CoolVDS offers local Norwegian presence.

Optimizing for Low Latency

If you are deploying on CoolVDS today, here is the checklist for maximum network throughput:

  1. Use NVMe Storage: Essential for Etcd performance.
  2. Tune the Kernel: Increase your connection tracking limits. The default is often too low for the number of connections K8s proxies.
  3. Network Proximity: If your users are in Oslo, don't host in Frankfurt. The speed of light is a hard constraint.

Here is a snippet to stick in your node startup script to ensure your conntrack table doesn't overflow during a DDoS or high load:

sysctl -w net.netfilter.nf_conntrack_max=131072
sysctl -w net.ipv4.tcp_tw_reuse=1
sysctl -w net.ipv4.ip_local_port_range="1024 65535"

Conclusion

Kubernetes networking is not magic. It is a complex stack of routing tables, iptables rules, and encapsulation protocols. Understanding this stack distinguishes the professionals from the hobbyists. When you are ready to build a cluster that actually performs under load, you need infrastructure that respects the laws of physics and resource isolation.

Don't let I/O wait times kill your API response. Deploy your master and worker nodes on CoolVDS NVMe instances. We provide the raw compute; you build the future.