Console Login

GitOps Workflow Best Practices: Stop 'ClickOps' Before It Kills Your Production

GitOps Workflow Best Practices: Stop 'ClickOps' Before It Kills Your Production

If you are still SSH-ing into servers to tweak nginx.conf or running kubectl apply -f from your laptop, you are not a sysadmin. You are a liability. I’ve seen production environments in Oslo melt down on a Friday afternoon because someone decided to make a "quick hotfix" manually and forgot to commit it. Two weeks later, the autoscaler spun up a new node, the manual fix vanished, and the service crashed. It’s embarrassing. It’s preventable.

By mid-2023, GitOps isn't just a fancy trend for Silicon Valley unicorns; it is the baseline for professional infrastructure management, especially here in Europe where GDPR and strict compliance standards like ISO 27001 demand audit trails.

The Core Philosophy: The Repo is the Source of Truth

The concept is simple but brutal to implement correctly: If it’s not in Git, it doesn’t exist.

Your cluster state must match your repository state. Always. This provides an automatic audit log for the Datatilsynet (Norwegian Data Protection Authority) without you lifting a finger. But implementation details matter.

1. The Pull Model vs. The Push Model

In the old CI/CD days (circa 2018), we used Jenkins to push changes to the cluster. This is dangerous. It requires giving your CI server god-mode access to your production credentials. If your CI gets hacked, your production is gone.

In 2023, we use the Pull Model. An agent inside the cluster (like ArgoCD or Flux) watches the Git repo and pulls changes. It effectively inverts the security model.

Here is a standard, production-ready ArgoCD Application manifest we use for high-availability setups:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: payment-gateway-norway
  namespace: argocd
spec:
  project: default
  source:
    repoURL: 'git@gitlab.com:coolvds-ops/payment-gateway.git'
    targetRevision: HEAD
    path: k8s/overlays/oslo-prod
  destination:
    server: 'https://kubernetes.default.svc'
    namespace: payments
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground
Pro Tip: Always enable selfHeal: true. If a junior dev manually changes a resource limit via the CLI, ArgoCD should immediately revert it. This is "Drift Detection" in action. It trains your team to stop touching the cluster directly.

2. Managing Secrets Without Committing Suicide

You cannot check passwords into Git. We know this. Yet, I still see `.env` files in repositories. In 2023, the standard is Sealed Secrets or the External Secrets Operator.

For a purely Kubernetes-native approach on CoolVDS instances, I prefer Sealed Secrets. You encrypt the secret with the cluster's public key on your workstation. Only the controller running inside the cluster can decrypt it. It’s safe to commit the encrypted blob to a public repo.

Here is how you seal a secret locally before pushing:

# 1. Create a raw secret (dry-run, so nothing hits the cluster yet)
kubectl create secret generic db-creds \
  --from-literal=password='SuperSecureNorwegianSalmon123!' \
  --dry-run=client -o yaml > secret.yaml

# 2. Seal it using the public key fetched from the controller
kubeseal --format=yaml < secret.yaml > sealed-secret.yaml

# 3. Delete the plain secret immediately
rm secret.yaml

The resulting sealed-secret.yaml is safe for Git.

3. Infrastructure as Code (IaC) Integration

GitOps handles the software deployment (Kubernetes manifests), but what about the servers themselves? You need Terraform or OpenTofu. We don't click buttons in a control panel to buy servers. We define them.

Below is a snippet of how we define a high-performance compute node suitable for a Kubernetes worker. Notice the anti-affinity notes—essential for availability.

resource "coolvds_instance" "k8s_worker" {
  count             = 3
  name              = "k8s-worker-oslo-${count.index}"
  region            = "no-oslo-1"
  image             = "ubuntu-22.04"
  flavor            = "nvme.2cpu.8gb" # High IOPS needed for container overlay FS
  ssh_keys          = [var.ssh_key_id]
  
  # Tagging for Ansible/automations
  tags = [
    "role:worker",
    "env:production"
  ]
}

The Hardware Reality: Why GitOps Needs Low Latency

Here is the truth nobody talks about in GitOps tutorials: Reconciliation loops are I/O intensive.

When you have 500 applications in ArgoCD, the controller is constantly hashing the state of the cluster against the state of Git. It queries the Kubernetes API server heavily. The API server stores everything in etcd.

If your VPS provider uses cheap, network-attached HDD storage (spinning rust) or throttled SSDs, your etcd latency spikes. When etcd writes take longer than 50ms, the Kubernetes API server starts queuing requests. Your GitOps syncs stall. You get timeouts. Your pipelines turn red.

Metric Standard VPS CoolVDS NVMe
Etcd fsync latency 15ms - 40ms < 2ms
Reconciliation Rate Sluggish (30s+ delay) Instant
Noisy Neighbors High impact Isolated KVM

We built CoolVDS on local NVMe specifically to solve the "etcd problem." For a Norwegian DevOps team, hosting in Oslo also means your latency to the Git provider (if self-hosted) and the registry is negligible.

War Story: The