Container Security in 2014: Why Shared Kernels Keep Me Awake at Night
It’s 2014. Everyone is talking about Docker. Since dotCloud open-sourced it last year, I’ve seen more developers trying to push “works on my machine” into production than I care to count. It is lightweight, it is fast, and the start-up time is practically zero. But let’s be real for a second: security on Linux containers (LXC) is terrifying if you don't know what you are doing.
I recently audited a setup for a client in Oslo—a media firm trying to scale their transcoding pipeline. They were running raw LXC containers on a bare-metal server, giving root access to contractors. I nearly pulled the plug on the server right there. Why? Because in the current state of containerization, root inside the container is effectively root on the host. If a process breaks out, they own your hardware, your network, and your data.
If you are deploying containers in Norway, where the Datatilsynet (Data Protection Authority) watches data leaks like a hawk under the Personal Data Act, you cannot afford a kernel panic or a privilege escalation exploit. Here is how we lock things down.
1. The "Root is Root" Problem
The biggest misconception I hear is that a container is a VM. It is not. It is a fancy way of using namespacing and cgroups to lie to a process about what it can see. But you are sharing the kernel. If you are running a standard Docker 0.8 install or vanilla LXC, the user ID 0 inside the container maps directly to UID 0 on the host kernel.
Until user namespaces mature (they are in the upstream kernel but barely usable in most distros like Ubuntu 12.04 LTS), you have to drop capabilities. Do not give a container power it doesn't need.
If you are using raw LXC, check your config. If you are experimenting with Docker, stop running with -privileged flags unless you want a security breach.
# The Wrong Way (Don't do this)
docker run -i -t --privileged ubuntu /bin/bash
# The Better Way: Drop capabilities explicitly
docker run -i -t --cap-drop=ALL --cap-add=NET_BIND_SERVICE --cap-add=SETUID ubuntu /bin/bash
2. Cgroups are your Safety Net
Noisy neighbors are the bane of shared hosting. This is why I generally despise cheap OpenVZ VPS providers. They oversell resources, and when one user compiles a kernel, everyone else suffers. If you are managing your own containers, you must enforce Control Groups (cgroups).
Without cgroups, a single container can consume all available RAM, triggering the host's OOM (Out of Memory) killer. In a worst-case scenario, the host kills the SSH daemon or the database instead of the rogue container.
Here is how we limit a container to 512MB of RAM in an LXC configuration file. This ensures that even if the application leaks memory, it only crashes itself, not the host.
# /var/lib/lxc/my-container/config
# Network configuration
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = lxcbr0
# Resource Limits (Crucial)
lxc.cgroup.memory.limit_in_bytes = 512M
lxc.cgroup.memory.memsw.limit_in_bytes = 1G
lxc.cgroup.cpu.shares = 512
Pro Tip: Never rely on default swap settings. On a high-I/O database server, swapping kills performance faster than a DDoS attack. Set swappiness to 0 or 10 on the host nodes.
3. Network Isolation with iptables
By default, containers often sit on a bridge (like docker0 or lxcbr0) and can talk to each other. If one container gets compromised via a SQL injection, the attacker can port scan every other container on that internal subnet.
You need to use iptables to segregate traffic. We script this using Chef at CoolVDS, but here is the logic if you are doing it manually. You want to drop forwarding between containers unless explicitly allowed.
# Flush existing forward rules
iptables -P FORWARD DROP
# Allow outgoing traffic (NAT)
iptables -A FORWARD -i docker0 -o eth0 -j ACCEPT
iptables -A FORWARD -i eth0 -o docker0 -m state --state RELATED,ESTABLISHED -j ACCEPT
# BLOCK inter-container communication
iptables -A FORWARD -i docker0 -o docker0 -j DROP
The Architecture Decision: KVM vs. Containers
This brings us to the hard truth. Containers are excellent for application packaging and deployment velocity. They are terrible for multi-tenant security isolation in 2014.
If you are building a hosting environment or handling sensitive Norwegian consumer data, you cannot rely on software-level namespaces alone. You need hardware virtualization.
This is why at CoolVDS, we do not sell container-based VPS (like OpenVZ). We sell KVM (Kernel-based Virtual Machine) instances. With KVM, your OS has its own kernel. If you want to run Docker, you run it inside your KVM instance.
Comparison: Where to run your code?
| Feature | OpenVZ / Native Container | Docker inside CoolVDS (KVM) |
|---|---|---|
| Isolation | Process level (Weak) | Hardware level (Strong) |
| Kernel Access | Shared with Host | Dedicated Kernel |
| Performance | Near Native | Near Native (VirtIO drivers) |
| Security Risk | Kernel panic kills everyone | Crash stays in your VM |
When you use KVM, you get the stability of dedicated hardware with the flexibility of the cloud. You can install your own kernel modules, tweak sysctl.conf parameters for high-load networking, and enable IP forwarding without asking support for permission.
Final Thoughts: Don't Skimp on the Foundation
The tech landscape is moving fast. Docker is changing version numbers every few weeks, and tools like Puppet and SaltStack are fighting for dominance in configuration management. But the fundamentals of Unix security haven't changed.
If you are serious about uptime and protecting your users, layer your defenses. Use containers for deployment, but wrap them in the ironclad isolation of a KVM hypervisor.
Need a sandbox to test your new Docker workflows? Deploy a high-performance SSD KVM instance on CoolVDS today. We offer low latency to Nordic ISPs and full root access—real root, not that fake container root.