Console Login

Stop SSH-ing into Production: A Battle-Tested GitOps Workflow for Norwegian Enterprises

Stop `kubectl apply`-ing Your Way to Disaster

If you are still SSH-ing into servers to tweak nginx.conf or running kubectl apply -f . from your laptop, you are a liability. I’ve said it. I've watched entire clusters implode because a senior engineer "just wanted to fix a quick typo" manually and forgot to commit the change. Three weeks later, the CI pipeline overwrote the hotfix, and the site went dark during a Black Friday flash sale.

In 2021, there is zero excuse for this. We aren't managing pet servers anymore; we are managing herds of cattle, and if you treat them like pets, they will bite you. The only way to guarantee consistency between what you think is running and what is running is GitOps.

The Architecture of Truth

GitOps isn't just a buzzword to throw around on LinkedIn. It is a strict operational framework where Git is the single source of truth. If it isn't in the repo, it doesn't exist.

For a robust setup targeting the European marketβ€”where we have to worry about GDPR, Schrems II, and Datatilsynet breathing down our necks regarding data sovereigntyβ€”I recommend the following stack currently stable as of late 2021:

  • VCS: Self-hosted GitLab (preferred for data control) or GitHub.
  • Controller: ArgoCD v2.1+ (It handles visual diffs better than Flux right now).
  • Secret Management: Bitnami Sealed Secrets (simple) or HashiCorp Vault (complex).
  • Infrastructure: KVM-based Virtual Private Servers (like CoolVDS) for the control plane.
Pro Tip: Don't run your GitOps controller on the same cluster it manages if you can avoid it. If the cluster goes down, you lose the tool you need to fix it. We run our ArgoCD instances on a dedicated CoolVDS management node in Oslo to ensure low latency access to the NIX (Norwegian Internet Exchange) while keeping management traffic separate from public traffic.

The Workflow: From Commit to Container

Here is the workflow I enforced at my last gig. It reduced deployment-related incidents by 90% in the first quarter.

1. The Repository Structure

Stop putting application code and infrastructure manifests in the same repo. It creates a noisy commit history and triggers unnecessary CI builds. Split them.

/my-app-source-code
  β”œβ”€β”€ src/
  β”œβ”€β”€ Dockerfile
  └── .gitlab-ci.yml

/my-infrastructure-repo
  β”œβ”€β”€ base/
  β”‚   β”œβ”€β”€ deployment.yaml
  β”‚   └── service.yaml
  └── overlays/
      β”œβ”€β”€ production/
      β”‚   β”œβ”€β”€ kustomization.yaml
      β”‚   └── patch-replicas.yaml
      └── staging/

2. The CI Pipeline (Continuous Integration)

The CI's only job is to run tests, build the Docker image, push it to the registry, and update the manifest repository. It does not touch the cluster.

Here is a snippet from a .gitlab-ci.yml that updates the image tag in the infrastructure repo using kustomize:

deploy_production:
  stage: deploy
  image: line/kubectl-kustomize:latest
  script:
    - git clone https://gitlab.com/org/infra-repo.git
    - cd infra-repo/overlays/production
    - kustomize edit set image my-app-image=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - git config user.email "ci-bot@coolvds.com"
    - git config user.name "CI Bot"
    - git commit -am "Bump image to $CI_COMMIT_SHA"
    - git push origin main
  only:
    - tags

3. The CD Controller (Continuous Deployment)

Once the manifest repo is updated, ArgoCD detects the drift. It sees that the Git state (new image SHA) differs from the Cluster state (old image SHA). It synchronizes them.

Here is a battle-hardened Application manifest. Note the selfHeal policy. If someone manually changes a replica count on the server, ArgoCD immediately reverts it.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: production-payment-gateway
  namespace: argocd
spec:
  project: default
  source:
    repoURL: 'git@gitlab.com:org/infra-repo.git'
    targetRevision: HEAD
    path: overlays/production
  destination:
    server: 'https://kubernetes.default.svc'
    namespace: payments
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

The Hardware Reality: Why IOPS Matter for GitOps

This is where many architects fail. They design a beautiful software architecture but deploy it on garbage infrastructure.

Tools like ArgoCD and the Kubernetes API server (etcd) are extremely chatty. They are constantly reading state, writing to the database, and checking diffs. I debugged a "broken" GitOps pipeline last month that turned out to be disk I/O latency. The provider's shared storage was saturated, causing etcd to timeout, which made the controller think the cluster was unresponsive.

You cannot tolerate "noisy neighbors" stealing your I/O cycles. This is why we reference CoolVDS in our internal wiki. Their NVMe storage stack provides the consistent random Read/Write speeds required for a responsive control plane. When I run fio benchmarks on a CoolVDS instance versus a standard cloud VPS, the difference in latency consistency is stark.

Metric Standard Cloud VPS CoolVDS (KVM + NVMe)
Random Read (4k) 2,500 IOPS 45,000+ IOPS
Disk Latency (99th percentile) 15ms - 40ms < 0.5ms
Etcd Sync Duration Variable (spikes) Consistent

Solving the Secret Problem

You cannot check passwords into Git. If you do, you have to rotate them immediately. In 2021, the cleanest approach for teams who don't want the overhead of HashiCorp Vault is Bitnami Sealed Secrets.

It uses asymmetric encryption. You encrypt the secret on your laptop using a public key. This produces a SealedSecret CRD that is safe to commit to public Git repos. Only the controller running inside the cluster (which holds the private key) can decrypt it.

# Create a secret locally (dry-run)
kubectl create secret generic db-creds \
  --from-literal=password=SuperSecret123 \
  --dry-run=client -o yaml > secret.yaml

# Seal it (safe for Git)
kubeseal --format=yaml < secret.yaml > sealed-secret.yaml

# Apply it (GitOps)
git add sealed-secret.yaml && git commit -m "Add db creds"

Compliance and the "Schrems II" Headache

For those of us operating in Norway and the broader EEA, the Schrems II ruling has complicated using US-owned cloud providers. If your GitOps controller is hosted on a US cloud, and it processes secrets or PII, you are in a grey area.

Hosting your GitOps control plane on a Norwegian VPS provider like CoolVDS mitigates this risk. Your data stays in Oslo. The jurisdiction is clear. When the auditors come knocking, you can point to the physical location of the servers and the lack of third-party data transfers.

Final Thoughts

GitOps is not optional for serious operations. It provides an audit trail, instant rollback capabilities, and disaster recovery (just re-apply the repo to a new cluster).

However, your workflow is only as reliable as the metal it runs on. Don't let IOPS bottlenecks masquerade as software bugs. Ensure your control plane has the dedicated resources it needs.

Ready to harden your infrastructure? Spin up a CoolVDS NVMe instance today and experience the difference low latency makes for your API server.