Container Security in 2014: Why "Root" Inside Docker Is Scarier Than You Think
Itâs been two months since Docker hit version 1.0, and my Twitter feed is exploding. Everyone from startups in Oslo to enterprise shops in Trondheim is trying to "containerize all the things." I get it. The ability to ship an entire environment in a tarball is seductive. But as someone who has spent the last decade waking up at 3 AM to fix broken servers, I look at the current state of container security and I don't see a revolutionâI see a minefield.
Letâs cut through the hype. If you are running Docker containers on bare metal or, god forbid, inside an OpenVZ slice, you are playing Russian Roulette with your kernel. The mantra of "it works on my laptop" doesn't account for the hostile environment of the public internet. Today, we are going to look at why standard container isolation is currently insufficient for multi-tenant environments and how you can actually secure this beast without retreating back to 2005.
The "Shared Kernel" Lie
The fundamental misunderstanding most developers have right now is assuming a container is a Virtual Machine (VM). It is not. In a VM (like the KVM instances we provision at CoolVDS), the hypervisor provides hardware emulation. The Guest OS has its own kernel. If the Guest kernel panics, the host doesn't care.
In Docker (and LXC), you are sharing the Host kernel. You are relying on Linux Namespaces and Cgroups to lie to the process and tell it that it's alone. But the kernel knows the truth. If a containerized process manages to crash the kernel, the whole physical server goes down. Worse, if a malicious user breaks out of the namespace, they are root on your host.
Pro Tip: Never, ever run a container with the --privileged flag unless you absolutely know what you are doing. It essentially disables all isolation protections, granting the container full access to devices. Itâs not a "fix" for permission errors; it's a surrender flag.
Hardening the Runtime: Beyond Defaults
Out of the box, Docker 1.1 is decent, but not bulletproof. Since Docker switched from LXC to libcontainer earlier this year, we have more control, but we also have new attack surfaces. Here is how I lock down containers before they touch a production network.
1. Drop Capabilities
The root user inside a container does not need the same power as the root user on the host. The Linux kernel breaks down root privileges into "capabilities." Most web apps don't need to change system time or load kernel modules. Drop them.
Here is how you start a container with a reduced capability set:
docker run -d --cap-drop=ALL --cap-add=NET_BIND_SERVICE --cap-add=SETUID --cap-add=SETGID nginx
This command drops everything and adds back only what Nginx needs to bind to port 80 and manage users. If an attacker manages to inject code into your Nginx process, they will find themselves in a straitjacket, unable to mess with the network stack or mount drives.
2. Leveraging AppArmor (Ubuntu 14.04 LTS)
Since most of us are deploying on Ubuntu 14.04 LTS (Trusty Tahr), we should use AppArmor. Docker ships with a default profile, but it is often too permissive. You can verify if the profile is loaded on your host:
$ sudo apparmor_status | grep docker
docker-default
If you have high-security requirementsâperhaps you are handling personal data subject to Norway's Personopplysningslovenâyou need to write a custom profile. This ensures that even if the container is compromised, the process cannot write to specific parts of the filesystem.
The "NSEnter" Debate
Debugging running containers is a pain right now. Since we don't have a native "exec" command yet (hopefully coming in 1.3?), everyone is installing SSH inside containers. Stop doing this. Running sshd inside a container adds overhead and another attack vector to patch.
Use JĂ©rĂŽme Petazzoniâs nsenter. It allows you to enter the namespace of a running container directly from the host.
# First, install nsenter
docker run -v /usr/local/bin:/target jpetazzo/nsenter
# Then, get the PID of the container
PID=$(docker inspect --format {{.State.Pid}} <container_id>)
# Enter the container
nsenter --target $PID --mount --uts --ipc --net --pid
This method leaves no permanent door open. You get in, fix the config, and get out.
The Architecture: Docker inside KVM
This brings us to the architecture Iâm currently deploying for a large e-commerce client in Oslo. They wanted the speed of Docker deployment but required strict isolation between their development and production environments.
We did not put their containers on bare metal. We provisioned CoolVDS KVM instancesâbacked by high-performance SSDsâand installed Docker inside those VMs.
| Feature | Docker on Bare Metal | Docker on KVM (CoolVDS) |
|---|---|---|
| Isolation | Process Level (Shared Kernel) | Hardware Level (Dedicated Kernel) |
| Risk | Host Compromise | VM Compromise only |
| Performance | Native | Near-Native (VirtIO drivers) |
| Compliance | Hard to audit | Clear boundary for Datatilsynet |
By nesting Docker inside a KVM VPS, we get the best of both worlds. The developers get their rapid deployment workflow. The operations team gets a hard security boundary. If a container breaks out, it is trapped inside the VM, not roaming free on the physical node accessing other tenants' data.
Data Persistence and I/O Performance
Another issue we are seeing in 2014 is the I/O penalty of Union Filesystems (AUFS/DeviceMapper). Writing to the container's writable layer is slow. For databases like MySQL or MongoDB, you must use Data Volumes to bypass the storage driver.
docker run -d -v /var/lib/mysql:/var/lib/mysql -p 3306:3306 mysql
However, even with volumes, the underlying disk speed matters. Standard SATA drives struggle when you have 50 containers fighting for IOPS. This is where hardware choice becomes critical. In our benchmarks, CoolVDS instances running on pure SSD arrays showed a 400% improvement in MySQL transaction times compared to standard HDD VPS hosting. If you are building for the future, spinning rust isn't going to cut it.
Conclusion
We are in the "Wild West" era of containerization. The tools are evolving weekly, and best practices are being written in real-time. But security principles don't change. Principle of Least Privilege. Defense in Depth. Isolation.
Don't let the shiny new toy distract you from the fact that you are responsible for your users' data. If you are experimenting with Docker, do it inside a secure perimeter. Deploy a KVM instance on CoolVDS, lock down your capabilities, and keep your kernel to yourself.
Ready to build a secure Docker fleet? Spin up a CoolVDS SSD instance in Oslo today and get root access in under 60 seconds.