Console Login

The End of SSH: Building Bulletproof GitOps Workflows with ArgoCD and Kubernetes 1.18

Stop Touching Production: The Case for GitOps in 2020

If you are still SSHing into your production servers to run docker-compose up -d or, god forbid, editing nginx configs with vim on a live node, you are a ticking time bomb. I’ve seen entire platforms evaporate because a "quick fix" wasn't committed to version control. In the Nordic market, where reliability is the currency we trade in, this cowboy engineering doesn't fly anymore.

The industry is shifting. We aren't just scripting deployments; we are moving to a model where the cluster state is a mirror of a Git repository. This is GitOps. It’s not just a buzzword; it’s the only way to manage Kubernetes at scale without losing your mind.

Today, I’m walking you through a battle-tested GitOps workflow using Kubernetes 1.18, ArgoCD, and GitLab CI. We will look at why the underlying hardware—specifically the disk I/O on your VPS—determines whether your GitOps controller syncs instantly or hangs in a loop.

The Architecture: Push vs. Pull

Traditional CI/CD is "Push-based." Jenkins builds an artifact and runs a script to push it to the server. The problem? If the server goes down or someone changes the config manually, Jenkins has no idea. The state has drifted.

GitOps is "Pull-based." You have an operator inside your cluster (we're using ArgoCD) that constantly asks: "Does my current reality match what is in Git?" If the answer is no, it fixes it. Automatically.

Pro Tip: In 2020, separating your application code repository from your configuration repository is mandatory. Do not mix your source code with your Helm charts or Kustomize manifests. It creates a circular dependency hell in your CI pipeline.

Step 1: The Infrastructure Foundation

Before we touch YAML, we need to talk about where this runs. Kubernetes is noisy. Components like etcd are incredibly sensitive to disk write latency (fsync). If your VPS provider is oversubscribing storage, your etcd cluster will struggle to elect leaders, and your GitOps controller will fail to reconcile state.

This is where standard cloud offerings often fail the "latency test." For my clusters in Norway, I deploy on CoolVDS NVMe instances. We need high IOPS and low latency to the Norwegian Internet Exchange (NIX) in Oslo. When ArgoCD detects a change, it triggers a massive amount of API calls. On a spinning disk or a throttled cloud volume, this feels sluggish. On CoolVDS NVMe, it's instantaneous.

Provisioning the Cluster Base

Here is a snippet using Terraform 0.12 to provision a solid node ready for K8s. Note the emphasis on the disk type.

resource "coolvds_instance" "k8s_worker" {
  name      = "k8s-worker-01"
  region    = "no-oslo-1"
  image     = "ubuntu-20.04"
  plan      = "nvme-8gb" # Critical: NVMe required for etcd stability
  
  ssh_keys = [
    var.my_ssh_key
  ]

  connection {
    type        = "ssh"
    user        = "root"
    private_key = file("~/.ssh/id_rsa")
    host        = self.ipv4_address
  }

  provisioner "remote-exec" {
    inline = [
      "apt-get update",
      "apt-get install -y apt-transport-https curl",
      "curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -",
      "echo 'deb https://apt.kubernetes.io/ kubernetes-xenial main' | tee /etc/apt/sources.list.d/kubernetes.list",
      "apt-get update",
      "apt-get install -y kubelet=1.18.5-00 kubeadm=1.18.5-00 kubectl=1.18.5-00",
      "apt-mark hold kubelet kubeadm kubectl"
    ]
  }
}

Step 2: The CI Pipeline (GitLab CI)

Your CI pipeline has one job: Build the Docker image, tag it with the commit SHA (never use :latest in production), and push it to the registry. It does not touch the Kubernetes cluster.

stages:
  - build
  - update-manifests

build_image:
  stage: build
  image: docker:19.03.12
  services:
    - docker:19.03.12-dind
  script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA

update_gitops_repo:
  stage: update-manifests
  image: alpine:3.12
  before_script:
    - apk add --no-cache git
    # Setup SSH agent for Git write access
  script:
    - git clone git@gitlab.com:my-org/k8s-config.git
    - cd k8s-config
    - sed -i "s/image: .*:/image: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA/" deployment.yaml
    - git config user.email "ci-bot@coolvds.com"
    - git config user.name "CI Bot"
    - git commit -am "Update image tag to $CI_COMMIT_SHORT_SHA"
    - git push origin master

Step 3: The GitOps Operator (ArgoCD)

With the config repo updated, ArgoCD takes over. We configure ArgoCD to watch the repository. When the commit lands, ArgoCD pulls the new manifest and applies it.

Here is the Application CRD you apply to your management cluster:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: production-api
  namespace: argocd
spec:
  project: default
  source:
    repoURL: 'git@gitlab.com:my-org/k8s-config.git'
    targetRevision: HEAD
    path: envs/production
  destination:
    server: 'https://kubernetes.default.svc'
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

Notice selfHeal: true. If a junior admin manually deletes a service, ArgoCD recreates it immediately. This is the immutability we strive for.

Why Local Hosting Matters for GitOps

There is a legal aspect here often overlooked by pure techies. The Datatilsynet (Norwegian Data Protection Authority) is closely watching the situation with US-based cloud providers. While we wait for clarity on data transfers (the aftermath of Privacy Shield is getting messy), hosting your Git repositories and your Kubernetes clusters on Norwegian soil is a massive compliance advantage.

CoolVDS offers that data residency. Your secrets, your code, and your customer data stay within the jurisdiction. Plus, the latency between your Git repository (if self-hosted) and your worker nodes is negligible.

Performance Tuning: The Hidden Configs

Out of the box, Linux isn't tuned for high-churn Kubernetes networking. On your CoolVDS nodes, you need to adjust sysctl settings to handle the connection tracking required by Kube-Proxy.

# /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
# Increase connection tracking for high load
net.netfilter.nf_conntrack_max = 131072

Apply this with sysctl --system. Without this, your pods will randomly lose connectivity under load, and you will blame the network when it's actually a kernel limit.

Conclusion

GitOps is not the future; it is the present standard for high-reliability systems. It removes the human error factor from deployments and provides an audit trail that makes compliance auditors smile.

However, software automation is only as good as the hardware it runs on. A GitOps workflow on a sluggish, oversold VPS is an exercise in frustration. You need guaranteed CPU cycles and NVMe throughput to handle the reconciliation loops of Kubernetes.

If you are ready to build a pipeline that doesn't wake you up at 3 AM, verify your architecture on infrastructure built for this decade. Spin up a CoolVDS NVMe instance in Oslo today and time how fast your etcd cluster settles. Speed is a feature you can't patch in later.