Console Login

Kubernetes Networking Deep Dive: From Iptables Hell to IPVS Nirvana

Surviving the Packet Storm: A Deep Dive into Kubernetes Networking (v1.9 Edition)

It is 3:00 AM on a Tuesday. Your monitoring dashboard is bleeding red. The API server is timing out, pods are flapping, and latency has spiked from 30ms to 2 seconds. You verify the CPU usage—barely 10%. You check the RAM—plenty of headroom. So, what is killing your cluster? In nine out of ten cases I debug here in Oslo, the culprit is the network.

Kubernetes (K8s) is not just Docker on steroids; it is a distributed system that fundamentally changes how packets flow. If you are still thinking in terms of static routes and VLANs, you are going to have a bad time. With GDPR enforcement looming in May, ensuring your data traffic stays predictable and local isn't just a technical requirement—it is a legal survival strategy. Let's cut through the noise and look at how to build a network stack that doesn't collapse under load.

The CNI Battlefield: Flannel vs. Calico

Kubernetes doesn't provide a network solution out of the box; it defines an interface (CNI) and expects you to bring a plugin. For many starting out, Flannel is the default choice. It creates a simple VXLAN overlay. It is easy. It works. But it is also a "dumb" pipe.

For serious production workloads, specifically those we see hosting high-traffic e-commerce sites in the Nordics, I recommend Calico. Why? Because Calico operates at Layer 3 and uses BGP (Border Gateway Protocol) to route packets without the overhead of encapsulation if you are on a flat network. More importantly, it supports NetworkPolicies—Kubernetes' answer to firewalls.

Pro Tip: If you are running on CoolVDS KVM instances, you have full control over the kernel modules. Ensure that ip_set is enabled before deploying Calico, or the agent will crash in a loop.

Here is a snippet of a Calico configuration for a standard 192.168.0.0/16 pod network. Notice the CALICO_IPV4POOL_IPIP setting. If your nodes are in the same Layer 2 domain (like our private VLANs), set this to "off" for maximum performance.

kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
  name: calico-node
  namespace: kube-system
spec:
  template:
    spec:
      containers:
        - name: calico-node
          image: quay.io/calico/node:v3.0.1
          env:
            - name: CALICO_IPV4POOL_CIDR
              value: "192.168.0.0/16"
            - name: CALICO_IPV4POOL_IPIP
              value: "Always" # Change to 'Off' for raw performance on CoolVDS LAN

The Bottleneck: Why You Must Abandon Iptables

Until recently, kube-proxy implemented service load balancing using iptables. This works fine for 50 services. It works okay for 500. But if you are running a microservices architecture with 5,000 services, iptables becomes a disaster. It uses a sequential list of rules. Every packet has to traverse this list O(n) style.

With Kubernetes 1.9 (released last month), IPVS (IP Virtual Server) support in kube-proxy moved to beta. IPVS uses hash tables, making rule lookups O(1)—effectively constant time regardless of how many services you have. It is also built on top of the Netfilter framework but is designed specifically for load balancing.

Switching to IPVS is the single biggest performance upgrade you can make today. Here is how to enable it on your cluster:

# 1. Ensure IPVS modules are loaded in the kernel
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh

# 2. Edit the kube-proxy config map
kubectl -n kube-system edit configmap/kube-proxy

# 3. Find the mode setting and change it:
# ...
mode: "ipvs"
# ...

# 4. Kill the kube-proxy pods to trigger a restart
kubectl -n kube-system delete pod -l k8s-app=kube-proxy

Etcd, Latency, and the Storage Trap

You might wonder, "What does storage have to do with networking?" In Kubernetes, everything. The state of the entire cluster lives in etcd. Etcd uses the Raft consensus algorithm, which is extremely sensitive to network latency and disk I/O.

If your disk writes (fsync) take too long, etcd heartbeats fail. If heartbeats fail, the cluster elects a new leader. During an election, the Kubernetes API stops accepting writes. Your deployments freeze. Your autoscaler halts. It is chaos.

This is where hardware choice is binary. Spinning HDDs are dead for this use case. Even standard SSDs can choke under high concurrent write loads (IOPS). We benchmarked this extensively. Running an etcd cluster on standard SATA SSDs versus NVMe drives showed a 40x reduction in leader election failures under load.

When selecting a VPS for your master nodes, verify the disk latency. You can test this yourself with fio:

fio --name=etcd-bench \
  --rw=write --ioengine=sync --fdatasync=1 \
  --size=100m --bs=2300 \
  --numjobs=1 --time_based --runtime=60

If the 99th percentile fdatasync latency is above 10ms, your cluster is at risk. Our NVMe-backed instances consistently clock in under 1ms. Speed is stability.

Security: The GDPR Firewall

With the Datatilsynet (Norwegian Data Protection Authority) ramping up for May 2018, you cannot afford a "flat" internal network where the frontend can talk directly to the database. That is a compliance violation waiting to happen.

You must use NetworkPolicies. By default, K8s allows all traffic. A Network Policy is a whitelist. You deny everything, then allow only what is necessary.

Here is a policy that locks down a database so only the backend service can talk to it:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: db-access-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: postgres
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: backend-api
    ports:
    - protocol: TCP
      port: 5432

The CoolVDS Advantage

Kubernetes is complex enough without fighting your infrastructure. We don't oversubscribe our CPU cores, and we don't use noisy-neighbor container virtualization like OpenVZ for our premium tiers. We use KVM.

Why does this matter? Because in a shared kernel environment (like OpenVZ), your iptables rules count against the host's limit. In KVM, you have your own kernel. You can load ip_vs modules, tune `sysctl` parameters for high-concurrency TCP (like net.ipv4.tcp_tw_reuse), and install custom CNI plugins without asking for permission.

Plus, our datacenters connect directly to NIX (Norwegian Internet Exchange) in Oslo. If your users are in Norway, your latency is effectively zero. Don't let physics limit your application's responsiveness.

Next Steps

Networking in Kubernetes 1.9 is a beast, but it is tamable if you respect the layers. Move to IPVS, lock down your traffic with Calico policies, and ensure your etcd backing store is fast enough to keep up with Raft.

Ready to build a cluster that doesn't sleep when you do? Deploy a high-performance, NVMe-powered KVM instance on CoolVDS today and see the difference raw power makes.