GitOps in 2019: Stop SSH-ing into Production and Start Sleeping at Night
I recently watched a junior developer accidentally delete a production Ingress controller because he thought he was in the staging namespace. It took us 45 minutes to restore traffic. The post-mortem was brutal, but the conclusion was simple: It wasn't his fault. It was the process.
If you are still manually running kubectl apply -f from your laptop, or letting Jenkins SSH into your cluster to patch deployments, you are operating on borrowed time. In the last year, the term "GitOps" (coined by Weaveworks) has moved from buzzword to necessity for anyone managing more than three microservices.
This isn't about jumping on a trend. It's about stability, security, and knowing exactly who changed what and whenβa requirement that Datatilsynet (The Norwegian Data Protection Authority) is becoming increasingly aggressive about enforcing post-GDPR.
The Core Problem: Configuration Drift
In a traditional push-based pipeline, your CI server builds a container and then runs a script to update the cluster. It works, until it doesn't. What happens when a sysadmin logs into the server and tweaks a memory limit to fix an OOM error at 3 AM? That change is not in Git. Next time the pipeline runs, it might overwrite the fix, or worse, the pipeline passes but the infrastructure state is now different from the code.
This is Configuration Drift. It creates "snowflake" servers that cannot be reproduced.
The GitOps Solution (The "Pull" Model)
GitOps inverts this. Your cluster contains an operator (like Flux) that watches a Git repository. When you change the manifest in Git, the operator sees the divergence and synchronizes the cluster to match the repo.
Pro Tip: This satisfies the GDPR requirement for audit trails automatically. The git log is your audit log. You know exactly who merged the Pull Request that changed the production database variable.
Step 1: The Infrastructure Foundation
Before we talk tools, we need to talk iron. GitOps relies heavily on orchestration tools like Kubernetes. Kubernetes relies on etcd for state. And etcd is notoriously sensitive to disk latency.
If you run a K8s cluster on a cheap VPS with noisy neighbors and spinning rust (HDD), etcd will time out waiting for fsync, causing your API server to flap. I've seen this happen on major US cloud providers during peak hours.
This is why we architect CoolVDS on pure NVMe storage. We aren't just selling "fast loading times"; we are selling the ability to run distributed consensus algorithms without them crashing because of I/O wait. When your control plane is hosted in Oslo with sub-millisecond local latency, your cluster convergence time drops significantly.
Step 2: Structuring the Manifests
Stop putting your Kubernetes YAMLs in the same repo as your application source code. Keep them separate. Your infrastructure repo should look like this:
βββ cluster-config/
β βββ namespaces.yaml
β βββ rbac.yaml
βββ workloads/
β βββ production/
β β βββ nginx-deployment.yaml
β β βββ service.yaml
β βββ staging/
Here is a standard deployment we might use for a high-traffic Nginx frontend. Note the resource limitsβnever deploy to a shared environment without them.
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend-nginx
namespace: production
annotations:
flux.weave.works/automated: "true"
flux.weave.works/tag.nginx: glob:1.15.*
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.15.8
ports:
- containerPort: 80
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
Step 3: implementing Flux (The Operator)
As of early 2019, Flux is the de-facto standard for this. It runs inside your cluster. It pulls, it doesn't accept pushes. This means you don't need to give your Jenkins server cluster-admin credentials. If your CI server gets hacked, your cluster is still safe because the CI server has no keys to the kingdom.
To install Flux, you typically apply its deployment manifest. You need to configure it to point to your config repo:
args:
- --git-url=git@github.com:your-org/infra-repo.git
- --git-branch=master
- --git-path=workloads/production
- --git-poll-interval=1m
- --sync-garbage-collection=true
Once running, Flux generates an SSH key. You add this key as a "Deploy Key" with write access to your GitHub/GitLab repository. Why write access? So Flux can update image tags automatically when a new container is built (if you enable that feature).
Performance: The Etcd Bottleneck
I mentioned etcd earlier. If you are setting up your own K8s cluster on CoolVDS (which many of our Norwegian clients do to keep data strictly under local jurisdiction), you need to verify your disk performance.
Run this fio command to test if your current VPS can handle the write load required for a production GitOps environment:
fio --rw=write --ioengine=sync --fdatasync=1 --directory=test-data --size=22m --bs=2300 --name=mytest
On a standard HDD VPS, you might see high latency here. On CoolVDS NVMe instances, this operation is nearly instantaneous. This difference is what prevents the "CrashLoopBackOff" nightmares when you try to sync 50 microservices at once.
Legal Nuances: Norway & GDPR
In the wake of 2018's GDPR rollout, data residency is critical. Using US-based managed Kubernetes services can introduce complex data transfer agreement requirements. By building your GitOps workflow on top of CoolVDS, you ensure that the physical underlying storageβthe actual bits and bytes of your database and logsβresides in Norwegian data centers.
Furthermore, because GitOps forces every change to go through a Pull Request, you have a built-in mechanism for the "Four-Eyes Principle" (required by many financial compliance standards). No single developer can change production without a code review.
Summary
Transitioning to GitOps requires a mindset shift. You have to take away SSH access from your developers. It feels restrictive at first, but the freedom from 3 AM paging alerts is worth it.
But remember: Automation amplifies the speed of deployment. If your underlying infrastructure is unstable, you are just crashing faster. You need low latency, high IOPS, and legal compliance.
Ready to build a robust platform? Don't let slow I/O kill your cluster. Deploy a high-performance NVMe instance on CoolVDS today and get your control plane running in under 55 seconds.