Kubernetes Networking: A Deep Dive into CNI, Overlays, and Why Your Packets Drop
It was 3:00 AM on a Tuesday when the alerts started firing. Our Grafana dashboards looked like a Christmas tree, but not the festive kind. Latency on the payment gateway microservice had spiked from 25ms to 400ms, and 502 Bad Gateway errors were flooding the Nginx ingress. The application hadn't changed. The load hadn't increased. The culprit? A subtle MTU mismatch in the overlay network that only manifested when the payload size hit a specific threshold.
Networking in Kubernetes is often treated as magic. You apply a YAML file, pods get IP addresses, and services talk to each other. But when that abstraction leaksâand it always doesâyou are left staring at tcpdump output wondering where your packets went to die.
If you are deploying production clusters in 2022, understanding the flow of a packet from the public internet, through the node interface, into the CNI overlay, and finally to the container is not optional. It is survival.
The CNI Jungle: Flannel vs. Calico vs. Cilium
The Container Network Interface (CNI) is the glue. It ensures every pod gets a unique IP. But not all glue is created equal.
For years, Flannel was the default choice. It's simple. It creates a VXLAN overlay. It works. But in high-throughput environmentsâlike the video streaming backends we often host for clients in Osloâencapsulation overhead kills performance. CPU usage spikes just to wrap and unwrap packets.
In 2022, my default recommendation for serious production workloads is Calico (for BGP capabilities) or Cilium (if you need eBPF observability). Here is why: BGP allows you to advertise pod IPs directly to your underlying network router, eliminating the need for encapsulation entirely.
Pro Tip: If you are running on CoolVDS, you have full control over the node's network stack. Unlike restrictive public clouds where BGP peering is a premium add-on or impossible, our KVM architecture allows you to run BGP daemons (like BIRD) right on the host. This drastically reduces latency.
Configuring Calico for Performance
Don't just apply the default manifest. If you are running on a local network within a single datacenter (like our Oslo zone), disable IPIP encapsulation for better throughput.
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
name: default-ipv4-ippool
spec:
cidr: 192.168.0.0/16
ipipMode: Never # Disable encapsulation if L2 connectivity exists
natOutgoing: true
The "It's Always DNS" Section
CoreDNS is the beating heart of service discovery. If CoreDNS chokes, your cluster looks like it's down, even if every application is healthy. We recently audited a client's setup where they were hitting 5-second timeouts on external API calls. The issue was ndots:5.
By default, Alpine-based images combined with standard glibc resolvers trigger sequential DNS lookups for every search domain. If your domain is app.svc.cluster.local, the resolver tries every permutation before hitting the external A-record.
Fix this in your deployment.yaml:
spec:
template:
spec:
dnsConfig:
options:
- name: ndots
value: "2"
This simple change reduced their external latency by 80%. When you run this on high-frequency NVMe storage (standard on CoolVDS), CoreDNS caching becomes incredibly fast, but configuration logic is still the bottleneck.
Ingress Controllers and The "Schrems II" Reality
In Norway, data sovereignty is a massive headache right now. The Schrems II ruling effectively killed the Privacy Shield. If you are terminating TLS on a US-owned load balancer, are you compliant? It is a grey area most legal teams want to avoid.
This is why many Norwegian DevOps teams are moving back to self-hosted Ingress Controllers on European infrastructure. Using the Nginx Ingress Controller gives you granular control over headers and WAF rules without sending SSL keys across the Atlantic.
Here is a hardened configuration snippet for handling high-concurrency traffic, specifically tuning the backlog queue to avoid dropped connections during bursts:
kind: ConfigMap
apiVersion: v1
metadata:
name: nginx-configuration
namespace: ingress-nginx
labels:
app.kubernetes.io/name: ingress-nginx
data:
worker-processes: "auto"
max-worker-connections: "65536"
keep-alive: "60"
upstream-keepalive-connections: "100"
compute-full-forwarded-for: "true"
use-forwarded-headers: "true"
Debugging the Data Plane
When packets drop, you need to go lower than `kubectl`. You need to look at the node's conntrack table. A common issue on standard VPS providers is hitting the `nf_conntrack` limit. K8s Services (ClusterIP) rely heavily on iptables or IPVS, creating massive entries in the connection tracking table.
Check your current usage:
# Check current usage
cat /proc/sys/net/netfilter/nf_conntrack_count
# Check the max limit
cat /proc/sys/net/netfilter/nf_conntrack_max
If you are seeing "table full, dropping packet" in `dmesg`, you need to bump these numbers. On CoolVDS, our default kernel tuning is optimized for virtualization workloads, but for heavy K8s clusters, we recommend setting this explicitly in your `/etc/sysctl.conf`:
net.netfilter.nf_conntrack_max = 524288
net.netfilter.nf_conntrack_tcp_timeout_established = 86400
Why Underlying Infrastructure Matters
You can tune sysctl until you are blue in the face, but you cannot software-optimize a noisy neighbor. In a shared hosting environment, if another tenant saturates the physical NIC, your UDP packets (essential for VXLAN) get dropped. Kubernetes interprets this as a node failure.
This is the "CoolVDS" factor. We don't oversubscribe network bandwidth. When you spin up a node in our Oslo datacenter, you get dedicated throughput. This stability is critical for etcd. Etcd requires fsync latency to be consistently low. If your disk I/O wavers because the hypervisor is busy, the raft consensus breaks, and your cluster leader creates a split-brain scenario.
Latency to NIX (Norwegian Internet Exchange)
For local traffic, geography is physics. Hosting in Frankfurt when your users are in Bergen adds 20-30ms of round-trip time. Hosting in Oslo keeps it under 5ms. We peer directly at NIX, ensuring that your VPS Norway traffic stays within the country, satisfying both the speed freaks and the Datatilsynet auditors.
Summary: The Checklist
- CNI: Use Calico or Cilium. Avoid overlays if possible.
- DNS: Tune `ndots` to reduce query amplification.
- Ingress: Host it yourself on local nodes to ensure GDPR compliance.
- Hardware: Ensure your etcd is backed by NVMe storage to prevent leader election failures.
Kubernetes is complex enough without fighting the infrastructure beneath it. You need a foundation that gets out of your way.
Ready to stabilize your cluster? Stop fighting noisy neighbors and deploy your worker nodes on CoolVDS. Low latency, NVMe storage, and DDoS protection included. Deploy a high-performance instance in Oslo now.