Console Login

Monitoring is Dead. Long Live Observability: A 2019 Survival Guide for Nordic DevOps

Monitoring is Dead. Long Live Observability.

It is 3:00 AM. Your phone buzzes. It’s PagerDuty. The alert says CPU > 90% on your main application server. You log in via SSH. The CPU is currently at 40%. The site is up. You have absolutely no idea what happened five minutes ago. You restart the service, pray to the gods of uptime, and go back to sleep. You have fixed nothing.

This is the failure of traditional monitoring. In the complex distributed architectures we are building in 2019—microservices, Kubernetes clusters, and containerized workloads—knowing that a system is failing is trivial. Knowing why is the hard part.

Welcome to the era of Observability.

The Difference: Known Unknowns vs. Unknown Unknowns

Let's strip away the marketing buzzwords. Most VPS providers in Europe will sell you "monitoring" which is basically a ping check. That is useless for a serious engineer.

  • Monitoring tells you the state of the system based on predefined thresholds. It answers questions you predicted you'd need to ask: "Is the disk full?" or "Is latency above 200ms?".
  • Observability is a property of the system that allows you to ask arbitrary new questions without shipping new code. It helps you debug the unknown unknowns. "Why is latency high only for iOS users in Bergen checking out with Vipps between 19:00 and 20:00?"
The CoolVDS Reality Check: You cannot achieve observability on shared hosting or container-based VPS platforms (like OpenVZ) where kernel access is restricted. True observability requires the ability to inspect kernel syscalls and unrestricted access to /proc and /sys. This is why CoolVDS strictly uses KVM virtualization. If you can't run eBPF or deep system tracing, you are flying blind.

The 2019 Observability Stack: Prometheus & Grafana

For most of our Norwegian clients running heavy workloads, the LAMP stack monitoring tools of 2015 are obsolete. In 2019, the gold standard for metric collection is Prometheus coupled with Grafana for visualization.

Why Prometheus? Because it uses a pull model. Your servers don't crash trying to push metrics to a central server that's down; Prometheus scrapes them when it can. Plus, the PromQL query language is powerful enough to do math on your infrastructure.

Step 1: Exposing Metrics

To observe, you must first emit data. If you are running Nginx, you need more than just the access logs. You need the stub_status module enabled to track active connections in real-time.

Here is a production-ready snippet for your nginx.conf. Do not expose this to the public internet unless you want your competitors to know your load:

server {
    listen 127.0.0.1:80;
    server_name localhost;

    location /metrics {
        stub_status on;
        access_log off;
        allow 127.0.0.1;
        deny all;
    }
}

Step 2: The Scrape Config

On your monitoring node (ideally a separate CoolVDS instance to ensure isolation), your prometheus.yml should look like this. Note how we set the scrape interval. If you scrape every minute, you miss micro-bursts. In 2019, 15 seconds is the standard acceptable granularity.

global:
  scrape_interval:     15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    static_configs:
      - targets: ['10.0.0.5:9100', '10.0.0.6:9100']
    
  - job_name: 'nginx'
    static_configs:
      - targets: ['10.0.0.5:9113']

Note: You'll need the Nginx Prometheus Exporter running on port 9113 to bridge the stub_status to Prometheus format.

The Storage Bottleneck (Why Your HDD is Killing You)

Here is the brutal truth about observability: It destroys I/O.

If you implement the ELK Stack (Elasticsearch, Logstash, Kibana) alongside Prometheus, you are writing gigabytes of logs and metrics every hour. Elasticsearch is particularly hungry for IOPS (Input/Output Operations Per Second). When a Lucene index segment merge happens, it will saturate a standard SATA drive, causing your application—which shares that disk—to hang.

I recently audited a setup for a client in Oslo. They were complaining that their Magento store froze every night at 02:00. They blamed the backup script. I looked at the metrics. It wasn't backups; it was their log rotation and indexing process choking the disk IO.

Benchmark Comparison: SATA vs NVMe

Storage Type Random Read IOPS Write Latency ELK Stack Viability
Standard HDD (SATA) ~100 - 150 10-20ms Fails under load
Standard SSD (SATA) ~5,000 - 10,000 1-3ms Acceptable
CoolVDS NVMe ~300,000+ <0.1ms Optimal

This is why we deploy NVMe storage by default. When you are querying 50 million log lines in Kibana to find a security breach, you cannot wait 10 minutes for a spinning disk to wake up.

Tracing: The Final Frontier

Metrics show you trends. Logs show you events. But in a microservices architecture, you need Distributed Tracing. If Service A calls Service B, which calls Service C, and Service C times out—Metrics on Service A just say "500 Error".

In 2019, tools like Jaeger (compatible with the OpenTracing standard) are becoming essential. Implementing this requires code changes, injecting headers into your HTTP requests. It is heavy lifting, but necessary for complex setups.

Here is a conceptual example of how a Docker Compose setup for a local observability stack might look in 2019:

version: '3.7'
services:
  jaeger:
    image: jaegertracing/all-in-one:1.13
    environment:
      - COLLECTOR_ZIPKIN_HTTP_PORT=9411
    ports:
      - "5775:5775/udp"
      - "6831:6831/udp"
      - "16686:16686"
  
  prometheus:
    image: prom/prometheus:v2.11.1
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:6.3.2
    ports:
      - "3000:3000"

Data Sovereignty and The Norwegian Context

We cannot discuss logging without discussing legality. If you are logging IP addresses or user identifiers, you are processing PII (Personally Identifiable Information) under GDPR.

Using US-based cloud monitoring services can be risky regarding data transfer compliance (Schrems II is looming on the horizon, creating uncertainty). By hosting your observability stack on VPS Norway infrastructure like CoolVDS, your data stays within Norwegian borders, subject to Norwegian law and Datatilsynet regulations. You own the data. You own the pipe.

Conclusion

Stop guessing. Stop restarting servers and hoping the problem goes away. Building an observability stack takes time—configuring Prometheus exporters, tuning Elasticsearch heap sizes, and writing Grafana dashboards—but the ROI is immediate when the next outage hits.

You need the right tools, and you need the hardware to run them without degrading your production app.

Ready to see what your servers are actually doing? Deploy a high-performance NVMe KVM instance on CoolVDS today. With sub-millisecond latency to NIX and raw compute power, you will finally have the visibility you deserve.