Container Security in 2019: Surviving the Breakout
If the recent runc vulnerability (CVE-2019-5736) taught us anything this February, it is that "isolation" in containers is often just a polite fiction. We tell ourselves that a Docker container is a lightweight VM, but it isn't. It's a process lying to itself about who its neighbors are. As a System Administrator who has spent the last decade fighting fires in data centers from Bergen to Oslo, I can tell you: defaults will get you hacked.
Most dev teams I meet in Oslo are rushing to Kubernetes or Docker Swarm without understanding the underlying kernel risks. They deploy a `node:latest` image running as root, expose port 8080, and call it a day. In a post-GDPR world, specifically under the watchful eye of Datatilsynet (The Norwegian Data Protection Authority), that negligence is a liability. Here is how we lock down container infrastructure effectively, using tools and practices available right now.
1. The Root Problem (Literally)
The biggest lie in containerization is that `root` inside the container is safe. By default, the root user inside a Docker container (UID 0) is the same root user on the host kernel. If an attacker breaks out—using an exploit like the recent runc flaw—they own your entire server. They own the hardware. They own the data.
The first rule of fight club: Never run as root.
In your `Dockerfile`, you must explicitly switch users. Don't rely on the base image to do it for you. Create a specific user for your application.
FROM alpine:3.9
# Create a group and user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
# Install dependencies
RUN apk add --no-cache nginx
# Switch to non-root user
USER appuser
ENTRYPOINT ["./entrypoint.sh"]
When you enforce this, you limit the blast radius. Even if code execution occurs, the attacker finds themselves trapped with limited permissions, unable to mount drives or modify iptables.
2. Immutable Infrastructure: Read-Only Filesystems
Containers should be ephemeral. If you are patching a running container, you are doing it wrong. Kill it and deploy a new image. To enforce this, I run containers with a read-only root filesystem whenever possible. This prevents an attacker from writing malicious executables or scripts to the disk.
This breaks many applications that expect to write logs or temporary files to local disk. The solution is to mount `tmpfs` for those specific paths.
docker run --read-only \
--tmpfs /run \
--tmpfs /tmp \
-v /var/log/app:/var/log/app:rw \
my-secure-app:1.2
This configuration forces your application to be stateless. It effectively neutralizes a whole class of malware droppers that try to `wget` a payload into `/bin` or `/usr`.
3. Kernel Capabilities: Drop 'Em All
The Linux kernel breaks down root privileges into distinct units called "capabilities." Does your Node.js API need to modify network interfaces (`NET_ADMIN`)? Does it need to audit system logs (`AUDIT_WRITE`)? Absolutely not.
Docker grants a broad set of capabilities by default. A hardened posture starts by dropping all capabilities and adding back only what is strictly necessary.
docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE ...
This is aggressive. A more pragmatic approach for most web servers in 2019 is simply preventing privilege escalation. We use the `no-new-privileges` security option. This kernel feature prevents a process from gaining more privileges than its parent, even if `setuid` binaries are executed.
Pro Tip: Using --security-opt=no-new-privileges:true is the single highest-value flag you can add to your deployment scripts today. It carries almost no performance penalty but significantly increases the difficulty of a breakout.
4. The Host Matters: Why We Use KVM
You can harden your Docker config all day, but if your neighbor on the physical host compromises the shared kernel, it is game over. This is the danger of "Container-as-a-Service" or budget VPS providers using OpenVZ or LXC.
At CoolVDS, we don't play dice with kernel sharing. We use KVM (Kernel-based Virtual Machine) hardware virtualization. Each VPS has its own isolated kernel. If another customer on the same physical rack causes a kernel panic or gets hit by a dirty exploit, your instance remains unaffected.
When you are running a database like MySQL 5.7 or MariaDB 10.3, you also need guaranteed I/O. Shared containers often suffer from "noisy neighbor" syndrome where disk throughput fluctuates wildy. We utilize pure NVMe storage arrays to ensure that when your container asks for IOPS, it gets them immediately. Low latency isn't a luxury; it's a requirement for decent SEO and user experience.
Comparison: Isolation Levels
| Technology | Kernel Isolation | Security Risk | Performance Consistency |
|---|---|---|---|
| Shared Hosting / OpenVZ | Shared | High | Unpredictable |
| Standard Container Cloud | Shared (usually) | Medium | Variable |
| CoolVDS (KVM) | Dedicated | Low | High (NVMe) |
5. The Supply Chain: Trust No One
In 2018, we saw malicious code injected into popular repositories. If you pull `FROM node`, you are trusting an upstream maintainer blindly.
Always pin your images to a specific hash (digest), not a tag. Tags like `latest` or even `12.04` are mutable. They can be overwritten. A SHA256 digest is immutable.
# Don't do this
FROM ubuntu:18.04
# Do this
FROM ubuntu@sha256:6d0e0c26489e33f5a6500236f..
Furthermore, use a scanner like Clair or integrate basic checks into your CI pipeline. We are seeing more clients in Norway requiring this for compliance. If you are handling personal data of Norwegian citizens, demonstrating that you scan for CVEs in your artifacts is a strong defense during a Datatilsynet audit.
6. Network Segmentation
Don't let your frontend talk to your analytics worker if it doesn't need to. In a standard Docker bridge network, inter-container communication is often wide open.
If you are using Docker Compose, define custom networks:
version: '3.7'
services:
web:
image: my-web
networks:
- frontend
db:
image: postgres:11
networks:
- backend
networks:
frontend:
backend:
By keeping the database strictly on the backend network, you reduce the attack surface. If the web container is breached, the attacker still has to figure out how to pivot to a network they can't natively route to.
Conclusion
Security is not a product; it is a process of layer-by-layer hardening. We start with the container configuration, move to the runtime, and ultimately rely on the isolation of the infrastructure itself.
Don't build your house on sand. Deploy your hardened containers on infrastructure that respects isolation and data sovereignty. With CoolVDS, you get the raw power of NVMe and the strict isolation of KVM, hosted right here in the region. Low latency to Oslo, high barriers to attackers.
Ready to lock down your stack? Spin up a secure KVM instance on CoolVDS today and stop worrying about noisy neighbors.