Stop Trusting the Default Docker Configuration
Let’s be honest with ourselves. The hype around Docker 1.12 and the new Swarm mode has every developer in Oslo deploying containers like they are invincible lightweight Virtual Machines. They are not. If you are running docker run in production with default settings, you aren't deploying an application; you are handing out root access to your host kernel.
I have spent the last week auditing a client's infrastructure—a startup trying to disrupt the fintech space here in the Nordics. They had a beautiful microservices architecture, but their security posture was non-existent. One kernel panic inside a container brought down their entire node. Why? Because they bought into the lie that cgroups and namespaces are a silver bullet. They are flimsy walls.
In this analysis, we are going to look at how to actually lock down a container runtime environment on Ubuntu 16.04 LTS, focusing on the realities of the Linux kernel capabilities, and why the underlying hardware virtualization—specifically KVM—matters more than your Dockerfile.
The Root Problem: It's the Same Kernel
When you spin up a KVM instance on CoolVDS, you get your own kernel. If you crash it, you crash your own server. But inside a container, you share the host's kernel. If a process inside the container can trigger a kernel bug, it affects the host and every other container running on it.
By default, the user inside the container is root (uid 0). If a vulnerability allows a breakout (and they happen), the attacker is root on your host. This is a nightmare for data integrity, especially if you are handling data sensitive to Norwegian privacy laws or preparing for the upcoming GDPR regulations.
1. Drop Capabilities, Don't Just Add Them
The Linux kernel divides the privileges of the root user into distinct units, known as capabilities. By default, Docker grants a significant list of capabilities to a container. You likely don't need them.
Does your Nginx web server need to audit the control group or load kernel modules? Absolutely not. Use a whitelist approach. Drop everything, then add back only what is strictly necessary.
# The wrong way (Standard)
docker run -d nginx
# The right way (Paranoid DevOps)
docker run -d \
--cap-drop=ALL \
--cap-add=NET_BIND_SERVICE \
--cap-add=SETGID \
--cap-add=SETUID \
--read-only \
--tmpfs /run \
--tmpfs /tmp \
nginx:alpine
In the example above, we drop ALL capabilities first. We strictly add back NET_BIND_SERVICE to allow binding to port 80. We also mount the filesystem as read-only. If an attacker manages to exploit a vulnerability in Nginx, they cannot write a backdoor to the disk.
2. User Namespaces (userns) are Mandatory
If you aren't using User Namespaces in late 2016, you are negligent. This feature maps the root user inside the container to a non-privileged user on the host machine. Even if they break out, they end up as a nobody (nobody:nogroup) on the host.
Configure this in your Docker daemon settings. On Ubuntu 16.04 with systemd, you'll need to edit the daemon startup options.
# /etc/systemd/system/docker.service.d/override.conf
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --userns-remap=default
After a systemctl daemon-reload and systemctl restart docker, check your subuid and subgid files. This adds a layer of complexity to volume management, but the security payoff is non-negotiable for production environments.
3. Filesystem Isolation and Storage Drivers
Storage is the bottleneck of every containerized system. We often see developers using the default aufs storage driver on older kernels which can lead to latency spikes. On modern systems, you should be looking at overlay2 if your kernel supports it, or devicemapper in direct-lvm mode for serious workloads.
Pro Tip: Never use the loop-lvm mode for Device Mapper in production. It is slow and prone to corruption. Always configure direct-lvm using a separate block device. At CoolVDS, our NVMe-backed block storage is optimized for high IOPS, making the I/O penalty of direct-lvm negligible.
Limiting Resources to Prevent DoS
A compromised container can try to consume all CPU cycles or memory, starving the host. This is a classic Denial of Service. While CoolVDS protects your VPS from "noisy neighbors" at the hypervisor level, you must protect your VPS from its own containers.
docker run -d -p 80:80 \
--memory="512m" \
--memory-swap="1g" \
--cpus="1.5" \
--pids-limit=100 \
my-web-app
The --pids-limit flag is particularly interesting. It prevents a fork bomb inside the container from crashing the host kernel's process table. This was a common attack vector we saw earlier this year.
The Infrastructure Reality Check
No matter how many flags you pass to Docker, software containers still share the kernel. There is always a non-zero risk of a kernel 0-day exploit. This is where your choice of hosting provider becomes a security control.
Many "Cloud" providers oversell resources and use lightweight container-based virtualization (like OpenVZ) to host your Docker containers. This is nesting containers inside containers. It is a performance disaster and a security minefield.
The CoolVDS Architecture:
| Feature | Budget Container Hosting | CoolVDS (KVM) |
|---|---|---|
| Kernel Isolation | Shared Kernel (High Risk) | Dedicated Kernel (High Security) |
| Virtualization | OpenVZ / LXC | Hardware Virtualization (KVM) |
| Swap Usage | Often unavailable/Shared | Dedicated NVMe Swap |
| Latency | Variable (Noisy Neighbors) | Consistent |
We strictly use KVM (Kernel-based Virtual Machine). This means your Docker host is running on its own virtualized hardware. Even if your neighbor on the physical rack melts down their system, your kernel remains untouched. For the Norwegian market, where uptime and data integrity are paramount for compliance with the Data Protection Authority (Datatilsynet), this isolation is required, not optional.
Auditing Your State
You cannot fix what you do not measure. I recommend running the Docker Bench for Security script. It checks for dozens of common best practices based on the CIS Docker Benchmark.
git clone https://github.com/docker/docker-bench-security.git
cd docker-bench-security
sh docker-bench-security.sh
Expect to fail half of these checks on a fresh install. Your goal is to remediate the critical failures, specifically those regarding the Docker daemon configuration and container runtime privileges.
Conclusion
Containerization in 2016 allows us to ship code faster than ever, but it shifts the security burden from the network team to the systems architect. You must treat the container boundary as porous.
Don't gamble your production data on shared kernels and default configurations. Hardening the runtime is step one. Step two is ensuring the ground you stand on is solid.
Ready to run Docker on true Hardware Virtualization? Deploy a CoolVDS KVM instance in Oslo today and experience the stability of dedicated resources.