Stop SSH-ing into Production: A Pragmatic GitOps Workflow for High-Compliance Environments
It is October 2018. If you are still manually copying artifacts to your server via SCP or, god forbid, editing nginx.conf directly on a production node using vim, you are not just inefficient—you are a security liability. With the GDPR enforcement that kicked in this past May, the "cowboy coding" era is officially dead. The Datatilsynet (Norwegian Data Protection Authority) does not care about your uptime; they care about your audit trails.
I have spent the last six months migrating a major Oslo-based e-commerce platform from a fragile Jenkins-scripted mess to a deterministic GitOps architecture. We learned the hard way that infrastructure drift is the silent killer of stability. When your staging environment works but production fails because someone tweaked a kernel parameter three months ago and forgot to document it, that is drift.
Here is how we solved it using the "Single Source of Truth" methodology, and why your underlying metal—specifically the I/O consistency—matters more than the fancy tools you run on top of it.
The Core Principle: Git is the Only Reality
The definition of GitOps is simple: Git is the source of truth for the entire system state. Not just the application code, but the infrastructure, the config maps, and the dashboard definitions.
In a traditional CI/CD push model (like Jenkins), the CI server has the keys to the castle. It runs kubectl apply. If the CI server is compromised, your cluster is gone. In the GitOps pull model we are moving toward, a process inside the cluster (like Weave Flux) pulls changes from Git. It is secure by design because the cluster does not expose its credentials outside.
Pro Tip: In 2018, we are seeing a shift from imperative commands (kubectl run) to declarative manifests. If you cannot recreate your entire datacenter from your git repository in under an hour, you do not have a Disaster Recovery plan, you have a hope and a prayer.
The Stack: 2018 Edition
We are building this on Kubernetes 1.12. Why? because it introduced Kubelet TLS bootstrap graduation and snapshotting features that are essential for stateful workloads. But K8s is resource-hungry. We run our control planes on CoolVDS NVMe instances. Why? Because etcd is incredibly sensitive to disk write latency. If your fsync latency spikes above 10ms, your cluster leader election fails. Standard SSD VPS providers often share I/O queues, leading to "noisy neighbor" issues that crash K8s masters. Dedicated NVMe slices are not a luxury here; they are a requirement.
Step 1: Containerizing with Multi-Stage Builds
Stop shipping build tools to production. Use Docker multi-stage builds (stable since 17.05) to keep images tiny. This reduces the attack surface and speeds up the pull time on your worker nodes.
# Stage 1: Build
FROM golang:1.11-alpine AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o main .
# Stage 2: Run
FROM alpine:3.8
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /app/main .
CMD ["./main"]
Step 2: The Manifests
Do not just write a Deployment. You need to define your resource quotas. Without `resources.limits`, a memory leak in one pod will OOM (Out Of Memory) kill your critical system processes. I have seen Java apps eat 64GB of RAM in minutes without limits.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: payment-processor
namespace: production
spec:
replicas: 3
template:
metadata:
labels:
app: payment-processor
spec:
containers:
- name: payment-go
image: registry.coolvds.com/payment:v1.4.2
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "256Mi"
cpu: "500m"
Step 3: Infrastructure as Code with Terraform
Before we deploy apps, we provision the nodes. We use Terraform 0.11. Yes, the HCL syntax is a bit clunky compared to what HashiCorp is promising for 0.12, but it works. Note the interpolation syntax ${} which is mandatory in this version.
variable "node_count" {
default = 3
}
resource "openstack_compute_instance_v2" "k8s_worker" {
count = "${var.node_count}"
name = "k8s-worker-${count.index}"
image_name = "Ubuntu 18.04"
flavor_name = "m1.large"
key_pair = "${var.key_pair}"
security_groups = ["default", "k8s-sg"]
network {
name = "private_network"
}
user_data = "${file("cloud-init.yaml")}"
}
The Automation Loop
We use Weave Flux (v1) running inside the cluster. It polls the Git repository every 5 minutes. If it detects a change in the YAML manifests, it applies them.
To check the status of a rollout without leaving your terminal, use this:
kubectl rollout status deployment/payment-processor --namespace=production
If you need to debug why a pod is failing, do not ssh into the node. Use:
kubectl describe pod -l app=payment-processor
kubectl logs -f -l app=payment-processor --tail=50
Why Infrastructure Matters for GitOps
GitOps relies on the assumption that the infrastructure will respond predictably to API calls. If you send a command to scale to 50 replicas, but your underlying storage array chokes on the image pulls, your automation fails.
This is where the choice of hosting provider becomes technical, not just financial. In Norway, latency to the NIX (Norwegian Internet Exchange) is critical. If your servers are routed through Frankfurt before hitting Oslo, you are adding 30ms of lag to every database transaction if your customers are local.
At CoolVDS, we configure our KVM host nodes with hugepages enabled and CPU pinning where necessary. This prevents the "stolen CPU" metric from creeping up, which is common in oversold VPS environments. When you run a GitOps pipeline that triggers a massive rolling update, that CPU spike needs to be absorbed instantly.
| Feature | Standard VPS | CoolVDS Architecture |
|---|---|---|
| Storage Backend | SATA SSD (Shared Queue) | NVMe (High Queue Depth) |
| Virt Tech | OpenVZ (Kernel Shared) | KVM (Kernel Isolated) |
| Etcd Reliability | Low (fsync wait times) | High (Direct I/O) |
Handling Secrets (The 2018 Headache)
You cannot store passwords in Git. In 2018, the best practice is using Sealed Secrets by Bitnami or integrating HashiCorp Vault. Do not commit base64 encoded secrets to public or even private repos. It is not encryption; it is obfuscation.
Here is how you generate a Sealed Secret locally before committing:
# Install kubeseal
kubeseal < mysecret.json > mysealedsecret.json --format=json
The mysealedsecret.json is safe to commit. Only the controller running inside your CoolVDS cluster has the private key to decrypt it.
Conclusion
Transitioning to GitOps is painful. You will break things. You will spend days fighting YAML indentation errors. But once it is running, the peace of mind is absolute. You know exactly what is running in production because it matches the main branch in Git. No hidden manual changes. No "works on my machine."
However, automation is only as good as the foundation it runs on. A flaky network or slow disk I/O will make your Kubernetes cluster unstable, regardless of how clean your Git history is.
If you are ready to build a stack that can actually handle production loads, stop playing with toy servers. Deploy a KVM instance on CoolVDS today, benchmark the NVMe I/O against your current provider, and see why we are the choice for Norwegian systems engineers who value sleep.