Console Login

Taming the Hydra: A Deep Dive into Kubernetes Networking & CNI Optimization

Taming the Hydra: A Deep Dive into Kubernetes Networking & CNI Optimization

Let’s be honest for a second. Kubernetes is brilliant for orchestration, but the networking model can feel like a black box designed to eat your sanity. I’ve seen seasoned sysadmins weep when a simple Service fails to resolve because of a misconfigured overlay network.

It’s December 2019. If you are still treating K8s networking like standard VM networking, you are doing it wrong. The abstraction layer is leaky. When you deploy a cluster, you aren't just managing containers; you are managing a software-defined network (SDN) that sits on top of physical infrastructure. If that foundation is shaky, your fancy microservices architecture will crumble.

In this post, we are cutting through the marketing noise. We’re going to look at the packet path, the choice between iptables and IPVS, and why your underlying hardware (yes, I’m talking about CoolVDS NVMe instances) defines your ceiling.

The CNI Jungle: Flannel vs. Calico

The Container Network Interface (CNI) is where the rubber meets the road. It determines how pods talk to each other. Too many teams default to Flannel because it’s "easy." Flannel typically uses VXLAN to encapsulate packets. It works, but encapsulation adds CPU overhead and reduces MTU size.

For high-performance workloads, I almost exclusively recommend Calico. Why? Because Calico can run in pure Layer 3 mode using BGP (Border Gateway Protocol). No encapsulation. No overhead. Just routing.

Here is a snippet of a Calico configuration ensuring we are using the correct interface for BGP peering. This is critical if your node has multiple interfaces (common in complex setups):

kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
  name: calico-node
  namespace: kube-system
spec:
  template:
    spec:
      containers:
        - name: calico-node
          env:
            - name: IP_AUTODETECTION_METHOD
              value: "interface=eth0"
            - name: CALICO_IPV4POOL_IPIP
              value: "Always" # Change to 'Off' for BGP mode if infrastructure supports it

Pro Tip: If you are running on CoolVDS, you have full control over your VM's network stack. You can disable IPIP encapsulation in Calico to get raw network performance, provided your security groups allow BGP traffic. This reduces latency significantly compared to VXLAN.

The Kube-Proxy Bottleneck: iptables vs. IPVS

This is where 90% of performance issues hide. By default, Kubernetes uses iptables to handle Service discovery and load balancing. In Linux, iptables is a list of rules. It is sequential. It is O(n).

If you have 5,000 services, the kernel has to traverse a massive list of rules for every single packet. Latency spikes. CPU usage on the node creeps up. I recently debugged a cluster where the sheer number of iptables rules added 20ms of latency to internal calls.

The solution? IPVS (IP Virtual Server). It’s built on the Netfilter framework but uses hash tables. It is O(1). Whether you have 10 services or 10,000, the lookup time is constant.

To enable this in your kube-proxy config (assuming you are using `kubeadm`):

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
  excludeCIDRs: null
  minSyncPeriod: 0s
  scheduler: "rr" # Round Robin
  strictARP: false
  syncPeriod: 30s

Before you switch, you must ensure the IPVS kernel modules are loaded on your worker nodes. Run this on your CoolVDS instance:

# Load necessary modules
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4

# Verify
lsmod | grep -e ip_vs -e nf_conntrack_ipv4

If you see the modules, you are ready to scale without the iptables misery.

The Physical Layer: Why "Where" Matters

You can optimize software all day, but physics always wins. Kubernetes relies heavily on etcd for state. Etcd is extremely sensitive to disk write latency (fsync). If your disk is slow, the API server slows down, and the whole cluster becomes unstable.

This is why we stress NVMe storage at CoolVDS. Rotating rust (HDDs) or even standard SATA SSDs can be a bottleneck for etcd in a busy cluster. We recently benchmarked a K8s control plane on our NVMe tiers versus a competitor's standard SSD VPS. The leader election timeouts disappeared on our infrastructure.

Latency & Geography

Furthermore, if your target market is Norway, hosting in Frankfurt or London adds unavoidable latency. Physics dictates roughly 1ms per 100km one way. By utilizing VPS Norway options, you cut the round-trip time (RTT) to the Norwegian Internet Exchange (NIX) in Oslo to effectively zero for local traffic. Low latency isn't just a luxury; for microservices ping-ponging requests, it's a requirement.

Securing the Traffic: Network Policies

By default, Kubernetes allows all traffic between all pods. In a multi-tenant environment, this is a security nightmare. The GDPR (Datatilsynet is watching!) requires privacy by design.

You must implement a "Default Deny" policy and then whitelist traffic. Here is a standard policy to drop all ingress traffic to a namespace unless specified:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress

Once applied, nothing gets in. You then specifically allow the frontend to talk to the backend:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080

Kernel Tuning for High Load

Default Linux kernel settings are tuned for general-purpose desktop use, not high-throughput container routing. When you hit high concurrency, you will likely hit the nf_conntrack table limit. When this table fills up, the kernel drops packets silently. Your application logs will show nothing. It is infuriating.

Tune your sysctl.conf on the host node immediately:

# Increase connection tracking max
net.netfilter.nf_conntrack_max = 131072

# Reduce timeout to clear the table faster
net.netfilter.nf_conntrack_tcp_timeout_established = 86400
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 3600

# Enable forwarding (Mandatory for K8s)
net.ipv4.ip_forward = 1

Apply with sysctl -p.

Conclusion

Kubernetes networking is a beast, but it is a tameable one. By moving from Flannel to Calico, upgrading to IPVS mode, and enforcing strict Network Policies, you build a cluster that is resilient and secure.

However, remember that software overlays cannot compensate for poor underlying infrastructure. High latency disks and congested networks will kill your Kubernetes dream faster than a bad config file. At CoolVDS, we provide the raw power—low latency, high I/O NVMe, and robust DDoS protection—so your cluster has the solid ground it needs to stand on.

Ready to stop debugging network flakes? Deploy your optimized K8s nodes on CoolVDS today and experience the difference true performance makes.