Console Login

Container Security in a Post-Schrems II World: Stop Running as Root

Container Security in a Post-Schrems II World: You're Doing It Wrong

Let's be honest. I audited a deployment for a fintech client in Oslo last week, and their production Dockerfiles looked like they were copy-pasted from a 2015 StackOverflow thread. We are talking USER root, huge attack surfaces, and secrets baked into environment variables. In the current climate—especially after the CJEU invalidated the Privacy Shield in July (Schrems II)—this isn't just bad practice; it is a legal liability.

Containers are not magic security boxes. They are just Linux processes with fancy namespaces and cgroups. If you exploit a container running as root, you are uncomfortably close to owning the host. And if that host is on a shared kernel in a massive public cloud, the blast radius is terrifying.

I have spent the last month migrating workloads from US hyperscalers to local infrastructure to keep the lawyers happy. Here is the technical breakdown of how to harden your container stack without destroying your developer velocity.

1. The Base Image: Less is More

The average Node.js image is over 600MB. That is 600MB of potential vulnerabilities. Stop using full OS images like ubuntu:latest or debian:buster unless you absolutely have to. In 2020, we have better options.

My go-to is Alpine Linux or Google's Distroless images. Alpine is roughly 5MB. If curl or bash isn't installed, an attacker can't easily use them to download a reverse shell script.

Pro Tip: If you are using Alpine, be aware of the musl libc vs glibc differences. It can break some Python wheels or C++ binaries. Test thoroughly before promoting to production.

The Multi-Stage Build Pattern

If you take nothing else from this post, use multi-stage builds. You build your artifact in a heavy image with compilers, then copy only the binary to a tiny runtime image. Here is how we structure Go microservices at CoolVDS:

# Stage 1: The Builder
FROM golang:1.15-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
# Build static binary, strip debugging symbols for size
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-w -s" -o main .

# Stage 2: The Runner
FROM scratch
COPY --from=builder /app/main /main
# Copy SSL certs because scratch doesn't have them
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
# Run as non-privileged user (ID 10001)
USER 10001
ENTRYPOINT ["/main"]

This results in an image that is literally just the binary. No shell. No package manager. Good luck exploiting that.

2. Runtime Privileges: Drop 'Em Like Hot Potatoes

By default, Docker containers retain too many Linux capabilities. Do your web workers need NET_ADMIN? Can they modify the host's system time? No.

You should drop all capabilities and only add back what is strictly necessary. In raw Docker, it looks like this:

docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE --user 1001 my-app:latest

If you are orchestrating with Kubernetes (and let's assume you are running at least v1.18), this belongs in your securityContext. I see so many manifests lacking this section.

Hardened Pod Security Context

Here is a snippet from a standard deployment manifest I use for high-compliance workloads. This forces the container to run as a read-only filesystem, which prevents attackers from writing malicious scripts to disk.

apiVersion: v1
kind: Pod
metadata:
  name: secured-api
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
  containers:
  - name: api-container
    image: my-registry/api:v2.4
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
          - ALL
        add:
          - NET_BIND_SERVICE
    volumeMounts:
    - name: tmp-volume
      mountPath: /tmp
  volumes:
  - name: tmp-volume
    emptyDir: {}

Notice the /tmp volume mount. Since the root FS is read-only, applications that need to write temporary files (like image processing libraries) will crash unless you explicitly mount a writable volume there.

3. The Infrastructure Layer: KVM vs Shared Kernel

This is where the hardware reality hits. Containers share the host kernel. If a vulnerability is discovered in the Linux kernel (like the recent BleedingTooth bluetooth issue, though that's more endpoint focused, but think Dirty COW from a few years back), container isolation fails.

For critical data, relying solely on namespaces is risky. This is why at CoolVDS, we use KVM (Kernel-based Virtual Machine). When you spin up a VPS with us, you aren't just getting a container in a massive shared pool; you are getting a dedicated kernel.

This adds a massive layer of defense. Even if an attacker breaks out of your container, they are trapped inside your KVM instance. They cannot access the neighboring infrastructure. For high-security environments, this hardware virtualization boundary is non-negotiable.

4. Data Sovereignty and Network Security

With the Schrems II ruling, the location of your data is now a security feature. If your hosting provider pipes your backup snapshots to an S3 bucket in Virginia, you are technically violating GDPR regarding EU citizen data.

You need to ensure your traffic stays local. We see excellent latency (sub-10ms) from Oslo to most of Northern Europe, but beyond performance, keeping traffic within the EEA is now a compliance requirement.

Configure your firewalls to deny by default. I use iptables or nftables directly on the host for granular control. Here is a basic rule set to block everything except SSH and Web traffic, creating a minimal attack surface:

# Flush existing rules
iptables -F

# Set default policies to DROP
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT

# Allow loopback
iptables -A INPUT -i lo -j ACCEPT

# Allow established connections
iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT

# Allow SSH (Limit this to your VPN IP in production!)
iptables -A INPUT -p tcp --dport 22 -j ACCEPT

# Allow HTTP/HTTPS
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT

5. Supply Chain Security

It is 2020. You cannot trust public registries blindly. We are seeing "typosquatting" attacks on Docker Hub where malicious images are named similarly to popular official ones (e.g., ngnix instead of nginx).

Use a scanner like Trivy. It is faster than Clair and easier to integrate into CI/CD pipelines. Before you push to your CoolVDS registry, scan the image.

trivy image --severity HIGH,CRITICAL my-app:v1.0

If this returns an exit code other than 0, break the build. Do not deploy. It is that simple.

Conclusion

Security is a trade-off between convenience and paranoia. Running as root is convenient. Using latest tags is convenient. But when you are dealing with PII under Norwegian jurisdiction, convenience gets you fined.

By shifting to minimal base images, dropping capabilities, and running on isolated KVM infrastructure like CoolVDS, you mitigate 90% of the attack vectors relevant today. Do not wait for a breach to audit your manifests.

Is your current staging environment compliant? Spin up a secured, NVMe-backed KVM instance on CoolVDS today and test your hardened containers in a proper isolated environment.