Container Security: Hardening Docker & Kubernetes for the Paranoiac
Let’s get one thing straight immediately: Containers are not Virtual Machines. I see this misconception destroying production environments repeatedly. When you run a process in a Docker container, you are—by default—sharing the kernel with the host. If a bad actor breaks out of that namespace, they own the node. In the wake of Log4Shell late last year, if you are still running containers as root because "it's just easier for the devs," you are professionally negligent. We need to talk about defense-in-depth, specifically for those of us operating under the strict eyes of Datatilsynet and GDPR requirements here in Norway.
1. The Supply Chain: Trust Nothing
The security of your infrastructure is defined before a single packet hits the network. It starts in the build pipeline. I recently audited a setup for a client in Oslo where their CI/CD pipeline was pulling node:latest. This is insane. You have no idea what is in that image tomorrow. Pin your versions by SHA256 digest, not tags. Tags are mutable; hashes are forever.
Furthermore, stop shipping build tools to production. A compiler has no business existing inside your runtime container. We use multi-stage builds to strip the fat. If gcc exists in your final image, an attacker can compile their exploit on your server. Don't give them the tools to rob you.
Implementation: Multi-stage Build & Distroless
Here is how we restructure a Dockerfile to reduce surface area using Google's distroless images, which lack even a shell. This makes RCE (Remote Code Execution) significantly harder.
# Stage 1: Build
FROM golang:1.18-alpine AS builder
WORKDIR /app
COPY go.mod ./
COPY go.sum ./
RUN go mod download
COPY *.go ./
RUN CGO_ENABLED=0 GOOS=linux go build -o /main
# Stage 2: Runtime
# Distroless images contain only the application and its runtime dependencies.
# No shell, no package manager, no noise.
FROM gcr.io/distroless/static-debian11
COPY --from=builder /main /
USER nonroot:nonroot
ENTRYPOINT ["/main"]
Using gcr.io/distroless/static-debian11 drops the image size drastically and removes the OS attack surface. Notice the USER nonroot:nonroot directive? That is mandatory. Never let a process claim ID 0.
2. Runtime Defense: Capabilities & Read-Only Filesystems
By default, Docker grants a container a broad set of Linux capabilities. Most web apps do not need NET_ADMIN or SYS_CHROOT. If your application is just serving HTTP requests, strip it naked. The concept of "Least Privilege" applies to syscalls too. We drop all capabilities and only add back what is strictly necessary.
Another layer of armor is the read-only root filesystem. If an attacker manages to exploit an application vulnerability, their first move is often to download a payload or modify a config file. If the filesystem is read-only, that script fails. It's a simple flag that kills entire classes of attacks.
Pro Tip: When running on CoolVDS NVMe storage, utilize the high I/O throughput to run continuous security scanning sidecars without tanking your application performance. Cheap VPS providers will throttle you the moment your security agent starts hashing files. High IOPS are a security feature.
Here is a Kubernetes SecurityContext configuration that enforces these restrictions. With Kubernetes 1.24 just dropping (removing Dockershim support—hope you updated), utilizing the standard API for security contexts is more critical than ever.
apiVersion: v1
kind: Pod
metadata:
name: secure-nginx
spec:
containers:
- name: nginx
image: nginx:1.21.6-alpine
securityContext:
runAsNonRoot: true
runAsUser: 101
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
volumeMounts:
- mountPath: /var/cache/nginx
name: cache-volume
- mountPath: /var/run
name: run-volume
volumes:
- name: cache-volume
emptyDir: {}
- name: run-volume
emptyDir: {}
Notice we had to mount writeable volumes for /var/cache/nginx and /var/run. This is the trade-off. You have to know exactly where your application writes. It requires discipline, but the stability payoff is immense.
3. The Infrastructure Layer: KVM Isolation
This is where the "Cloud Native" dogma often clashes with reality. Running containers on bare metal or shared container instances implies a level of trust in the neighbor's workload. In a multi-tenant environment, soft isolation (cgroups/namespaces) has historically been vulnerable to escape vulnerabilities (like Dirty COW or the recent cgroup v1 exploits).
For high-compliance sectors in Norway—finance, health, or anything touching GDPR data—we don't trust software isolation alone. We want hardware virtualization boundaries. This is why CoolVDS uses KVM (Kernel-based Virtual Machine) for every instance. Even if a container kernel panic occurs or an attacker escapes the container, they hit the hypervisor wall. They are trapped inside your VPS, protecting the rest of the infrastructure.
Monitoring for Anomalies with Falco
Static analysis is not enough. You need runtime threat detection. Falco is the de facto standard here. It parses Linux syscalls from the kernel at runtime. If a shell spawns in a container that shouldn't have one, Falco screams.
Here is a basic Falco rule to detect a shell spawning in a container:
- rule: Terminal shell in container
desc: A shell was used as the entrypoint for the container user
condition: >
spawned_process and container
and shell_procs and proc.tty != 0
and container_entrypoint
output: >
A shell was spawned in a container with an attached terminal (user=%user.name %container.info
shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline terminal=%proc.tty container_id=%container.id image=%container.image.repository)
priority: NOTICE
tags: [container, shell, mitre_execution]
4. Network Policies: The Firewall Inside
By default, in a Kubernetes cluster, every pod can talk to every other pod. The database can talk to the frontend; the cache can talk to the logger. This is a flat network, and it is a playground for lateral movement. If an attacker breaches your frontend, they have a direct line to your backend database port.
We must implement NetworkPolicies. This is essentially an internal firewall for K8s. Deny all traffic by default, then whitelist only specific paths.
Example: Deny All Ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
spec:
podSelector: {}
policyTypes:
- Ingress
Once applied, silence falls. Nothing moves. You then selectively open ports. It forces you to document and understand your traffic flow.
Comparison: Container Isolation vs. VM Isolation
| Feature | Container (Docker/LXC) | CoolVDS KVM Instance |
|---|---|---|
| Kernel | Shared with Host | Dedicated Kernel |
| Boot Time | Milliseconds | Seconds (Fast on NVMe) |
| Isolation Level | Process (Namespaces) | Hardware (Hypervisor) |
| Security Risk | Moderate (Kernel exploits) | Low (Hypervisor barrier) |
Conclusion: Paranoia is a Virtue
Security is not a product you buy; it is a process you suffer through. In 2022, with supply chain attacks rising and geopolitical tensions in Europe impacting cyber threat levels, relying on default configurations is reckless. Whether you are hosting a simple API or a complex microservices mesh, the layers matter.
Your infrastructure must be built on bedrock. Running hardened containers on weak, oversold hosting is like putting a bank vault door on a tent. You need the raw I/O performance to handle logging and scanning, and the strict isolation of KVM to sleep at night.
Don't wait for a breach to audit your stack. Spin up a hardened environment today. Deploy a CoolVDS KVM instance in Oslo and lock it down before the bots find you.