GitOps in Production: Stop Manual Kubectl Deployments Before You Break Prod
If you are still SSHing into your production servers to run docker pull or, worse, running kubectl apply -f . from your laptop, you are a ticking time bomb. I’ve seen it happen too many times. A developer hot-patches a config map to fix a "critical bug" at 2 AM. Two weeks later, the cluster autoscales, the pod restarts, and the patch vanishes. The site goes down. The logs are useless because the change was never committed.
It is 2018. We have better ways to handle this. The industry is coalescing around a concept Weaveworks calls GitOps. It is not just a buzzword; it is the only sane way to manage distributed systems at scale.
At its core, GitOps forces a simple rule: Git is the single source of truth. If it is not in the repo, it does not exist in the cluster. Period.
The Architecture of Truth
In a traditional push-based pipeline (like Jenkins jobs of yore), the CI server has the "keys to the kingdom." It builds the artifact and pushes it to the server. If the CI server is compromised, your production environment is toast. Furthermore, the CI server doesn't know if the actual state of the cluster matches the desired state. It only knows it ran a script.
GitOps flips this. You use an operator inside the cluster (like Weave Flux) that pulls changes.
- Code: Developer pushes code to Git.
- Build: CI (GitLab CI, CircleCI) builds the Docker image and pushes it to a private registry.
- Config Update: The CI (or a developer) updates the deployment manifest in a separate config repository with the new image tag.
- Sync: The cluster operator detects the change in the config repo and applies it.
This separates the build process from the deployment process. It is safer. It is cleaner. And it provides an audit trail that makes compliance auditors in the EU very happy.
Implementing the Workflow
Let's look at how this works in practice. We assume you are running a Kubernetes cluster (version 1.10+ recommended). For the underlying infrastructure, you need raw compute stability. Containers add abstraction overhead; running them on oversold shared VPS infrastructure is asking for I/O waits. We use CoolVDS KVM instances for our worker nodes because the dedicated resources ensure the Kubelet doesn't timeout due to neighbor noise.
1. The Container Build
First, optimize your build. We are seeing a lot of bloat in images lately. Use multi-stage builds (available since Docker 17.05) to keep your runtime artifacts small.
FROM golang:1.11-alpine as builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o main .
FROM alpine:3.8
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /app/main .
CMD ["./main"]
2. The CI Pipeline
In your `.gitlab-ci.yml`, do not deploy. Just build and tag. Here is a stripped-down example of what a 2018-era pipeline stage looks like:
stages:
- build
- release
docker_build:
stage: build
image: docker:stable
services:
- docker:dind
script:
- docker login -u gitlab-ci-token -p $CI_JOB_TOKEN $CI_REGISTRY
- docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
3. The Deployment Manifest
This is where the magic happens. In a separate repository (infrastructure-as-code), you define your state. This is what the cluster monitors.
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: payment
template:
metadata:
labels:
app: payment
spec:
containers:
- name: payment
# The tag below is updated automatically by your CI or Flux
image: registry.coolvds.com/payment-service:a1b2c3d
ports:
- containerPort: 8080
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
Pro Tip: Always define resource limits. If you don't, a memory leak in one pod can OOM (Out of Memory) kill your entire node. On CoolVDS, we respect these limits strictly at the hypervisor level, but Kubernetes needs to know about them to schedule effectively.
The Norwegian Context: Latency and Law
Why does infrastructure location matter in a GitOps workflow? Two reasons: Sync Latency and GDPR.
When you update your Git repository, your cluster pulls the new image. If your registry is in the US and your nodes are in Oslo, you are pulling gigabytes of data across the Atlantic. This slows down your "Mean Time to Recovery" (MTTR). By keeping your container registry and your worker nodes in the same datacenter—preferably peered directly at NIX (Norwegian Internet Exchange)—you cut retrieval times drastically.
Furthermore, GDPR is in full effect as of May this year. Datatilsynet is not to be trifled with. If you are mounting PersistentVolumes (PVs) to your pods, that data must reside within the legal framework you promised your customers. Using a US-based cloud provider introduces complexities regarding the Cloud Act. Hosting on CoolVDS in Norway simplifies this compliance architecture significantly. Your data stays here.
Handling "Drift"
The beauty of GitOps is drift detection. If someone manually changes the replica count to 5 using `kubectl scale`, the operator (like Weave Flux) wakes up, sees the Git repo says "3", and scales it back down to 3 immediately.
This enforces discipline. No more "cowboy engineering." If you want to scale, you make a Pull Request. The team reviews it. You merge it. The cluster updates.
Comparison: Push vs. Pull
| Feature | CIOps (Jenkins Push) | GitOps (Cluster Pull) |
|---|---|---|
| Security Credentials | CI Server has Root Cluster Access | Cluster has Read-Only Git Access |
| Drift Detection | None (Fire and Forget) | Constant (Self-Healing) |
| Disaster Recovery | Complex Re-run of jobs | `kubectl apply -f git-repo` |
Conclusion
Automation is not about being lazy. It is about being consistent. In 2018, manual server management is a professional negligence risk. By adopting GitOps, you gain an audit trail, automated recovery, and better security posture.
However, your orchestration is only as good as the metal it runs on. A Kubernetes node in a NotReady state because the underlying storage is thrashing will break your pipeline regardless of how clean your Git history is.
For your next cluster, ensure you are building on high-performance NVMe infrastructure that guarantees the IOPS your etcd and workloads demand. Deploy a CoolVDS instance today and stop fighting your infrastructure.