Surviving the Shared Kernel: Hardening LXC & OpenVZ for Production
Let’s be honest: we all love the density of container-based virtualization. Whether you are running OpenVZ or experimenting with the rising LXC (Linux Containers) project on Ubuntu 12.04, the ability to spin up fifty instances on a single physical box without the overhead of fifty kernels is addictive. But there is a dark side to this efficiency that most hosting providers gloss over: the shared kernel.
I have spent the last three weeks debugging a massive outage for a client in Stavanger who thought standard chroot was a security feature. It is not. If an attacker gains root access inside a poorly configured container, they are potentially one syscall away from owning the host node. In Norway, where Datatilsynet (The Data Inspectorate) watches data integrity like a hawk under the Personal Data Act, a breakout like that isn't just an admin headache; it is a legal disaster.
The Reality of "Root" in 2013
The fundamental flaw in most current container deployments is that root (UID 0) inside the container often maps directly to root on the host. If I can exploit a kernel vulnerability from inside the container, I have root on your metal. With the recent PTRACE exploits and various symlink race conditions, relying solely on standard permissions is suicide.
Here is how we lock this down. If you are managing your own nodes—or if you are tired of bargain-bin VPS hosts that don't understand isolation—implement these controls immediately.
1. Drop Capabilities or Die
By default, containers often start with too many capabilities. You need to strip these down. In your LXC configuration, you explicitly deny capabilities that allow kernel module loading or raw network manipulation. Do you really need CAP_SYS_MODULE inside a web server container? No.
# /var/lib/lxc/my-container/config
# Drop dangerous capabilities
lxc.cap.drop = sys_module
lxc.cap.drop = sys_rawio
lxc.cap.drop = mac_admin
lxc.cap.drop = sys_time
# Mount /proc and /sys as read-only to prevent tampering
lxc.mount.auto = proc:mixed sys:ro
2. The New Frontier: User Namespaces (Kernel 3.8+)
This is bleeding edge, but if you are compiling your own kernels (and you should be), the newly released Linux Kernel 3.8 (Feb 2013) has finally stabilized User Namespaces. This is the holy grail we have been waiting for.
It allows us to map UID 0 inside the container to an unprivileged UID (like 100000) on the host. Even if an attacker breaks out, they find themselves as a nobody user on the host system.
Pro Tip: Most distributions like CentOS 6.3 stick to older kernels (2.6.32). You will need to pull from mainline or use the Ubuntu Raring ring to test this. At CoolVDS, our R&D labs are already benchmarking Kernel 3.8 for our next-gen KVM nodes.
3. Network Isolation with Iptables
Don't just bridge everything to br0 and hope for the best. ARP spoofing between containers is real. I enforce strict iptables rules on the host node to prevent containers from talking to each other unless explicitly allowed. This is crucial for multi-tenant environments.
# On the HOST node
# Create a chain for container traffic
iptables -N CONTAINER_ISOLATION
# Drop traffic between containers on the same subnet
iptables -A CONTAINER_ISOLATION -s 192.168.1.0/24 -d 192.168.1.0/24 -j DROP
# Allow gateway access (internet)
iptables -A CONTAINER_ISOLATION -d 192.168.1.1 -j ACCEPT
# Apply to the bridge
iptables -I FORWARD -i br0 -j CONTAINER_ISOLATION
Resource Starvation and the "Noisy Neighbor"
Security isn't just about hackers; it's about availability. In a shared environment, one runaway PHP script can eat all the I/O operations, bringing your MySQL database to its knees. This is why standard "Shared Hosting" is dead for serious business.
We use Control Groups (cgroups) to put hard shackles on what a container can consume. Don't rely on the "fair scheduler." Be unfair. Prioritize your critical workloads.
# /cgroup/blkio/lxc/my-db-container/blkio.weight
# Give the database container 10x priority over the web container
echo 1000 > /cgroup/blkio/lxc/db-container/blkio.weight
echo 100 > /cgroup/blkio/lxc/web-container/blkio.weight
At CoolVDS, we automate this. Our platform uses enterprise-grade SSDs (no spinning rust for primary storage), and we hard-limit IOPS per instance so that the guy next door mining Bitcoins doesn't crash your Magento store.
The KVM Advantage
Look, hardening containers is fun, but it's a constant arms race against kernel exploits. If your data falls under strict compliance—like medical records or financial data governed by Norwegian law—you shouldn't be gambling on shared kernels.
This is why CoolVDS defaults to KVM (Kernel-based Virtual Machine). With KVM, you get your own kernel. The isolation is hardware-assisted. If your kernel panics, my node keeps humming. It's heavier than LXC, sure, but with modern virtualization extensions (Intel VT-x), the overhead is negligible compared to the peace of mind.
| Feature | OpenVZ / LXC | CoolVDS (KVM) |
|---|---|---|
| Kernel | Shared (Riskier) | Dedicated (Isolated) |
| Performance | Native Speed | Near-Native (VirtIO drivers) |
| Security | Depends on Host Config | Hardware Virtualization |
| Custom Modules | Impossible | Full Control (VPNs, Fuse, etc.) |
Final Thoughts
If you are deploying containers in 2013, you are on the cutting edge, but you are also on the bleeding edge. Update your kernels, use grsecurity patches if you can, and audit your lxc.conf files.
Or, stop fighting the config files and deploy on a platform that handled this for you. CoolVDS offers pure KVM instances with SSD RAID-10 storage in our Oslo datacenter, ensuring low latency and full compliance with local regulations.
Don't let a kernel panic take down your business. Spin up a secure KVM instance on CoolVDS today.