Console Login

Observability is Not Monitoring: Why Your Green Dashboards Are Lying to You

Stop Trusting Your CPU Graphs

It’s 3:00 AM in Oslo. Your PagerDuty didn't fire. Your Zabbix dashboard is a calming sea of green. Your CPU load is sitting comfortably at 20%. Yet, your support ticket queue is flooding with angry Norwegians claiming they can't complete a transaction.

This is the classic failure of monitoring. You are watching the health of the infrastructure, not the behavior of the application.

In 2024, deploying a VPS in Norway and installing `htop` isn't enough. If you are running complex microservices or even a monolithic Magento shop, you need to move from asking "Is the system healthy?" to "Why is the system behaving this way?". That is the shift to Observability.

The "Known Unknowns" vs. The "Unknown Unknowns"

I have had arguments with CTOs who claim Observability is just a buzzword used to sell expensive SaaS tools. They are wrong. The distinction is architectural.

  • Monitoring is for known unknowns. You know disk space can run out, so you set an alert for 90% usage. You know latency can spike, so you ping the endpoint.
  • Observability is for unknown unknowns. Why did latency triple only for iOS users on the Telenor network when accessing the `/cart/checkout` endpoint? You didn't write an alert for that specific permutation because you didn't know it could happen.

To solve the latter, you need high-cardinality data. You need traces. And you need infrastructure that doesn't choke when writing gigabytes of telemetry data per minute.

Implementing the Stack: OpenTelemetry (OTel) on Linux

Let's get technical. In May 2024, the standard is OpenTelemetry. Vendor lock-in is dead. If you are still hardcoding Datadog agents into your application code, you are doing it wrong.

Here is a battle-tested configuration for an OpenTelemetry Collector running on a CoolVDS instance. We use this to aggregate traces before sending them to a backend like Jaeger or Tempo.

1. The OTel Collector Configuration

Create a file named otel-collector-config.yaml. Note the memory limiter; without this, your collector will OOM kill itself under heavy load.

receivers:
  otlp:
    protocols:
      grpc:
      http:

processors:
  batch:
  memory_limiter:
    check_interval: 1s
    limit_mib: 1024
    spike_limit_mib: 256

exporters:
  otlp:
    endpoint: "tempo:4317"
    tls:
      insecure: true
  prometheus:
    endpoint: "0.0.0.0:8889"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheus]

2. Instrumenting a Python Application

Don't rewrite your code. Use auto-instrumentation. If you are running a Flask app on a CoolVDS Compute Instance, simply inject the OTel libraries at runtime.

pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install

export OTEL_SERVICE_NAME="checkout-service-oslo"
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"

# Launch your app with instrumentation
opentelemetry-instrument python app.py

Now, every SQL query, every HTTP request, and every Redis call is traced. You can see exactly where the bottleneck is.

The Hidden Cost: I/O and Storage

Here is the part most tutorials skip. Observability generates a massive amount of write operations. If you turn on full tracing for a high-traffic site, you are writing thousands of spans per second to your storage backend (Elasticsearch, Loki, or ClickHouse).

I recently audited a setup where the team blamed their logging stack for being slow. They were running an ELK stack on a budget VPS with standard SATA SSDs. The `iowait` was consistently over 40%.

This is where hardware matters.

Pro Tip: Check your disk latency. Run ioping -c 10 . on your current server. If your average latency is above 1ms, your database is waiting on the disk, and your traces will be delayed.

At CoolVDS, we use NVMe storage arrays by default. We don't throttle IOPS on our premium tiers because we know that when you are ingesting 50GB of logs a day, standard SSDs effectively become rotational disks in terms of performance. You need low latency at the block device level.

Data Sovereignty and The "Norsk" Reality

Technically, you could send all this telemetry to a US-based SaaS cloud. Legally, you probably shouldn't.

Under GDPR and the Schrems II ruling, IP addresses and User IDs found in logs can be considered PII (Personally Identifiable Information). If you are exporting trace data that contains a Norwegian user's IP address to a server in Virginia, you are inviting a conversation with Datatilsynet (The Norwegian Data Protection Authority).

Keeping your observability stack local is the pragmatic choice.

  • Compliance: Data stays in Norway/Europe.
  • Latency: Sending traces from a server in Oslo to a collector in Oslo takes <1ms. Sending them to US-East takes 90ms+. That latency adds up on the application side if you aren't using asynchronous UDP shipping.

Optimizing the Database for Metrics

If you are self-hosting Prometheus for metrics on CoolVDS, you need to tune the retention and block duration. The default settings are meant for small testing, not production.

Adjust your systemd service file or Docker command to include:

--storage.tsdb.retention.time=30d
--storage.tsdb.wal-compression
--storage.tsdb.min-block-duration=2h
--storage.tsdb.max-block-duration=2h

WAL compression reduces disk usage significantly, but it costs a tiny bit of CPU. On our high-frequency compute cores, this trade-off is negligible.

Conclusion: Build It Right, Build It Local

Green dashboards are comforting, but they are often a lie. To truly understand your infrastructure, you need to implement tracing and high-cardinality logging. This requires two things: a modern software stack (OpenTelemetry) and hardware that can handle the write punishment (NVMe).

Don't let slow I/O kill your insights. If you need a sandbox to test this OTel configuration, spin up a high-performance instance in Norway today.

Deploy your Observability Stack on CoolVDS in under 55 seconds.