The Container "Isolation" Lie
Let's rip the band-aid off: Containers are not real virtualization. They are processes masquerading as isolated units, sharing the same kernel as the host. If you are deploying Docker containers in production with default settings in 2019, you aren't just opening a door for attackers; you are holding it open and serving them coffee.
I recently audited a setup for a client in Oslo—a fintech startup moving from bare metal to Kubernetes. They were proud of their CI/CD velocity. Then I looked at their definitions. Every single pod was running as root. They had mounted the host filesystem /var/run/docker.sock to "make building easier." I demonstrated how a simple shell injection in their Node.js app could allow me to wipe their entire cluster, including the persistent volumes storing customer data. The silence in the room was deafening.
Speed means nothing if your infrastructure is compromised. In Norway, where the Datatilsynet (Data Protection Authority) does not mess around with GDPR breaches, security is not optional. It is survival.
1. The Root Problem (Literally)
By default, a process inside a Docker container runs as PID 1 with UID 0 (root). If an attacker breaks out of the application, they are root on your container. If they exploit a kernel vulnerability (like the Dirty COW exploit from a couple of years ago), they are root on the host node.
You must enforce non-root execution at the image build level. Do not rely on runtime flags alone.
Correct Dockerfile Pattern
FROM alpine:3.9
# Create a group and user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
# Install dependencies
RUN apk add --no-cache python3
# Tell Docker to switch context
USER appuser
WORKDIR /home/appuser
COPY . .
CMD ["python3", "app.py"]This is basic, yet 80% of the images I see on Docker Hub ignore it. When you run this, even if the application is compromised, the attacker finds themselves trapped as appuser with limited permissions.
2. Immutable Infrastructure & Read-Only Filesystems
Containers should be ephemeral. If you are patching a running container, you are doing it wrong. Rebuild the image. To enforce this, mount the root filesystem as read-only. This prevents attackers from downloading malicious scripts or modifying binaries.
Here is how you enforce that via `docker run`:
docker run --read-only \
--tmpfs /run \
--tmpfs /tmp \
-v /my/data:/data:rw \
my-secure-imageWe use `tmpfs` for temporary directories because the application might crash if it can't write to `/tmp`, but the rest of the OS remains frozen.
3. Kubernetes PodSecurityPolicies (PSP)
If you are orchestrating with Kubernetes (and by now, in mid-2019, most serious shops are moving to v1.14+), you cannot trust developers to write secure YAMLs. You need to enforce it at the cluster level.
PodSecurityPolicies are currently the gold standard for admission control. They prevent pods from starting if they violate your security profile.
Pro Tip: Do not just apply a restrictive PSP blindly. You will break your CNI plugins (like Calico or Flannel) and system controllers. Create a specific PSP for business logic workloads.
Here is a strict PSP that denies privilege escalation:
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: restricted-psp
spec:
privileged: false
# Prevent changing user ID to root
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
volumes:
- 'configMap'
- 'emptyDir'
- 'persistentVolumeClaim'
- 'secret'
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
rule: 'MustRunAsNonRoot'
seLinux:
rule: 'RunAsAny'
supplementalGroups:
rule: 'MustRunAs'
ranges:
- min: 1
max: 65535
fsGroup:
rule: 'MustRunAs'
ranges:
- min: 1
max: 655354. Limiting Kernel Capabilities
Linux capabilities break down the "all-or-nothing" power of root into smaller privileges. A web server does not need `NET_ADMIN` (network configuration) or `SYS_MODULE` (loading kernel modules). Drop everything and add back only what is strictly necessary.
In your docker-compose or run command:
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICEThis ensures that even if an escalation occurs, the attacker cannot modify network tables or load a rootkit.
5. The Infrastructure Layer: Why CoolVDS Matters
Software limits like cgroups and namespaces are robust, but bugs happen. Hardware isolation is the final backstop. This is where your choice of hosting becomes a security architecture decision.
Many budget providers use container-based virtualization (like OpenVZ/LXC) for their VPS offerings. This is dangerous for high-security workloads because you are sharing the kernel with the provider's other customers. A kernel panic in their container affects you. A kernel exploit in your neighbor's instance could theoretically expose your memory.
| Feature | Container VPS (OpenVZ) | CoolVDS (KVM) |
|---|---|---|
| Kernel Isolation | Shared | Dedicated |
| Memory Privacy | Software Restricted | Hardware Virtualized |
| Custom Kernel | No | Yes (Install SELinux/Grsecurity) |
| Disk I/O | Often Shared/Noisy | Dedicated NVMe |
At CoolVDS, we exclusively use KVM (Kernel-based Virtual Machine) virtualization. Each VPS Norway instance runs its own independent kernel. Even if your container runtime is compromised, the attacker is trapped inside a VM sandbox, not on the bare metal host. Combined with our local NVMe storage, you get the I/O throughput needed for database-heavy microservices without the "noisy neighbor" security risks.
Network & Latency Considerations
Security is also about availability (the 'A' in CIA triad). DDoS attacks are rampant in Europe right now. Running your cluster on a provider with weak upstream connectivity is a risk. We peer directly at NIX (Norwegian Internet Exchange), ensuring that local traffic stays local—low latency for your Oslo users and compliance with data residency requirements.
6. Continuous Scanning
Finally, static analysis is mandatory. You cannot deploy black boxes. Use tools like Clair or Anchore Engine to scan your images for CVEs before they hit the registry. Integrate this into your Jenkins or GitLab CI pipelines.
# Example Anchore CLI check
anchore-cli image add myapp:latest
anchore-cli image wait myapp:latest
anchore-cli image vulns myapp:latest allIf the scan returns High severity vulnerabilities (like the recent runc vulnerability CVE-2019-5736), the build fails. No exceptions.
Summary
Container security in 2019 requires a defense-in-depth strategy:
- Build Secure: Non-root users, minimal base images (Alpine).
- Run Secure: Read-only filesystems, dropped capabilities, PSPs.
- Host Secure: Strong isolation via KVM on CoolVDS.
Don't let a misconfigured YAML file be the reason you have to explain a data breach to the Datatilsynet. Secure your infrastructure from the bottom up.
Need a hardened environment for your Kubernetes cluster? Deploy a KVM-based, NVMe-powered instance on CoolVDS today and sleep better tonight.