Console Login

Stop Watching Traffic Lights: Why Monitoring Fails and Observability Saves Production

Stop Watching Traffic Lights: Why Monitoring Fails and Observability Saves Production

It is 3:00 AM on a Tuesday. Your PagerDuty screams. The dashboard shows all red. You know the server is down. Great. That is monitoring. But why is it down? Is it a memory leak in the new microservice? Did the database lock up? Or did a cleaning crew in a datacenter trip over a fiber cable? If you cannot answer that in under five minutes, you do not have observability. You just have a dashboard of traffic lights.

In 2020, relying solely on "is it up?" checks is negligence. With distributed systems and microservices becoming the standard even here in the Nordics, the complexity has outpaced simple health checks. We need to talk about the shift from Monitoring to Observability, and why your choice of infrastructure—specifically where your data lives—matters more now than it did six months ago.

The "Known Unknowns" vs. The "Unknown Unknowns"

Let’s cut through the marketing noise. Monitoring is for known unknowns. You know disk space can run out, so you monitor disk usage. You know CPUs get overloaded, so you set an alert for 90% load. It is a reactive safety net for problems you have already anticipated.

Observability is for the unknown unknowns. It is a property of your system, not a tool. A system is observable if you can understand its internal state purely by inspecting its outputs. It allows you to ask arbitrary questions without deploying new code. "Why is latency spiking only for iOS users in Bergen checking out with Vipps?" Monitoring won't tell you that. Observability will.

The Three Pillars: Logs, Metrics, and Traces

To achieve this, we rely on three distinct types of data. If you are deploying on a CoolVDS instance, you have the raw I/O throughput to handle this ingestion, but you need to configure it correctly. Let's look at the stack.

1. Structured Logs (The Context)

Grepping through raw text files in /var/log is over. If your logs aren't machine-parsable, they are useless at scale. You need structured logging (JSON). Here is how you configure Nginx to stop shouting text and start whispering data:

http {
    log_format json_combined escape=json
      '{ "time_local": "$time_local", '
      '"remote_addr": "$remote_addr", '
      '"remote_user": "$remote_user", '
      '"request": "$request", '
      '"status": "$status", '
      '"body_bytes_sent": "$body_bytes_sent", '
      '"request_time": "$request_time", '
      '"http_referrer": "$http_referrer", '
      '"http_user_agent": "$http_user_agent" }';

    access_log /var/log/nginx/access.json json_combined;
}

Now you can ship this directly to Elasticsearch or Loki. Suddenly, you can query request_time > 1.0 and group by remote_addr instantly.

2. Metrics (The Trends)

Metrics are cheap to store and great for spotting trends. Prometheus is the industry standard here. It pulls data; it doesn't wait for pushes. This architecture is robust because your monitoring system failing doesn't bring down your app.

A typical prometheus.yml scrape config for a node exporter running on your VPS looks like this:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['localhost:9100']
    relabel_configs:
      - source_labels: [__address__]
        regex: '.*'
        target_label: instance
        replacement: 'norway-node-01'

3. Distributed Tracing (The Causality)

This is where the battle is won. Tracing follows a request as it hops from your load balancer, to your backend, to your database, and back. In 2020, Jaeger is the tool of choice for this. It visualizes the critical path.

Implementing tracing requires code instrumentation. If you are using Python with Flask, you might use the jaeger_client library. It adds overhead, which is why running this on shared hosting is a disaster. You need the dedicated CPU cycles of a proper KVM VPS to process traces without slowing down the actual user request.

from jaeger_client import Config

def init_tracer(service):
    config = Config(
        config={
            'sampler': {
                'type': 'const',
                'param': 1,
            },
            'logging': True,
        },
        service_name=service,
    )
    return config.initialize_tracer()

The Infrastructure Reality: NVMe or Die

Here is the part most tutorials skip. Observability generates a massive amount of data. If you turn on full tracing and detailed metrics, your write operations (IOPS) will skyrocket. A traditional SATA SSD, or worse, a spinning HDD, will choke. Your monitoring stack will end up causing the outage you are trying to prevent.

Pro Tip: Never run your observability stack on the same physical disk as your production database unless you have guaranteed IOPS. This is why at CoolVDS, we standardized on NVMe storage. When you are writing 5,000 log lines per second to Elasticsearch, you need the low latency and high throughput that only NVMe provides.

The Schrems II Elephant in the Room

We cannot ignore the legal landscape. In July 2020, the CJEU invalidated the Privacy Shield (Schrems II). This is a nightmare for European DevOps teams relying on US-based SaaS observability platforms (like Datadog or New Relic) without strict Standard Contractual Clauses (SCCs) and supplementary measures.

If your logs contain PII (IP addresses, user IDs, emails) and you ship them to a US cloud, you are likely non-compliant with GDPR. The safest architectural decision right now for Norwegian companies is data sovereignty. Host your observability stack (Prometheus, Grafana, ELK) on servers physically located in Norway.

By running a self-hosted stack on a provider like CoolVDS, where the data center is in Oslo or nearby, you keep the data under Norwegian jurisdiction. You satisfy Datatilsynet, and you get lower latency to your local services. It’s a win-win.

Building the Stack: A Quick Docker Compose

Ready to deploy a local observability stack? Here is a tested docker-compose.yml to get Prometheus and Grafana talking on your CoolVDS instance within minutes.

version: '3'

services:
  prometheus:
    image: prom/prometheus:v2.22.0
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--storage.tsdb.retention.time=15d'
    ports:
      - 9090:9090
    restart: always

  grafana:
    image: grafana/grafana:7.3.0
    depends_on:
      - prometheus
    ports:
      - 3000:3000
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=SecretPassword123!
    restart: always

volumes:
  prometheus_data:
  grafana_data:

Don't Fly Blind

Complexity is not going away. If anything, with the rise of Kubernetes and microservices, it is getting worse. You can either treat your infrastructure as a black box and pray, or you can instrument it, visualize it, and own it.

But remember, observability is resource-intensive. It demands fast storage, reliable compute, and—legally speaking—local residency. Don't compromise your insights with sluggish hardware or risky data transfers.

Is your stack ready for the load? Deploy a high-performance, NVMe-backed CoolVDS instance in Oslo today and build an observability platform that actually tells you why.