Console Login

Beyond `htop`: The 2018 Guide to Application Performance Monitoring in Norway

Beyond htop: The 2018 Guide to Application Performance Monitoring in Norway

If I had a krone for every time a developer told me their code was optimized while the server was visibly choking on I/O wait, I could retire to a cabin in Geilo. It is September 2018. We are living in a world where users expect instant interactions. If your API takes 500ms to respond, you don’t just have a latency problem; you have a churn problem.

Since the GDPR enforcement deadline passed in May, the game has changed. You can no longer blindly throw data at third-party US-based monitoring SaaS platforms without checking the fine print on Privacy Shield. You need visibility, but you also need data sovereignty. This is specifically critical here in Norway, where the Datatilsynet is not known for its leniency regarding data leaks.

In this guide, we are going to build a monitoring stack that respects your users' privacy and exposes the truth about your infrastructure. No fluff. Just configurations that work.

The "Black Box" Syndrome

I recently audited a Magento shop hosting a flash sale. The site went down. The devs blamed the hosting provider; the provider blamed the PHP code. Who was right?

They were both flying blind. They were watching htop and seeing the CPU spike, but they didn't know why. Was it a locked database table? Was it a noisy neighbor on their shared VPS stealing cycles? Was it a PHP worker exhaustion?

To fix this, we need metrics at three layers: The Edge (Nginx), The Logic (PHP/Python/Go), and The Metal (System/Disk).

Layer 1: The Edge – Nginx Timing

Most default Nginx configurations are useless for performance debugging. They tell you who visited, but not how long they waited. We need to define a custom log format that exposes $request_time (total time) and $upstream_response_time (time the backend took).

Edit your /etc/nginx/nginx.conf:

http {
    # ... existing config ...

    log_format perf_audit '$remote_addr - $remote_user [$time_local] "$request" '
                          '$status $body_bytes_sent "$http_referer" '
                          '"$http_user_agent" '
                          'rt=$request_time uct="$upstream_connect_time" uht="$upstream_header_time" urt="$upstream_response_time"';

    access_log /var/log/nginx/access_perf.log perf_audit;
}

Now, tail that log during a load test. If urt (upstream response time) is low but rt (request time) is high, the client has a slow connection. If both are high, your backend is the bottleneck.

Layer 2: The Database – MySQL/MariaDB Slow Logs

Database latency is the number one application killer. By default, MySQL only logs queries taking longer than 10 seconds. In 2018, 10 seconds is an eternity. We need to catch anything slower than 1 second, or even 0.5 seconds.

Add this to your my.cnf (usually under /etc/mysql/conf.d/):

[mysqld]
slow_query_log = 1
slow_query_log_file = /var/log/mysql/mysql-slow.log
long_query_time = 1
log_queries_not_using_indexes = 1
min_examined_row_limit = 100

Pro Tip: Be careful with log_queries_not_using_indexes on a production server with heavy write volume. It can generate massive log files rapidly. Rotate your logs using logrotate or you will fill your disk and crash the server, achieving the exact opposite of your goal.

Layer 3: The Metal – Prometheus & Grafana

For real-time visualization, forget legacy Nagios setups. The industry standard right now is Prometheus paired with Grafana 5. It is open-source, self-hostable (GDPR friendly!), and pulls metrics rather than waiting for pushes.

We use the node_exporter to expose system metrics. Here is a clean SystemD unit file to keep it running on Ubuntu 18.04 LTS:

[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

Once running, this exposes metrics at port 9100. You can then visualize CPU steal time. This is critical. If you see high "Steal" time, your hosting provider is overselling their physical CPU cores. This is a common practice among budget providers.

The Hardware Reality Check: NVMe vs. Spinning Rust

You can optimize Nginx and MySQL all day, but if your underlying storage I/O is slow, your application will crawl. This is simple physics.

In 2018, many VPS providers are still pushing "SSD" hosting which is often just SATA SSDs sharing a saturated bus. For database-heavy applications, you need NVMe. The latency difference is not just noticeable; it is exponential under load.

Here is how you test your current provider's disk latency. Install ioping:

sudo apt-get install ioping
ioping -c 10 .

The Benchmarks:

Storage Type Average Latency Verdict
Standard HDD ~5ms - 10ms Unusable for databases
SATA SSD (Shared) ~0.5ms - 1ms Acceptable for small sites
CoolVDS NVMe ~0.04ms Required for high-load Apps

If you are seeing latency above 1ms on a "high performance" plan, you are being bottled by your infrastructure. At CoolVDS, we standardized on NVMe storage and KVM virtualization precisely to eliminate this bottleneck. KVM ensures that your RAM and CPU are yoursβ€”no noisy neighbors crashing your party.

Keeping Data in Norway

Latency isn't just about disk speed; it's about network distance. If your customers are in Oslo, Bergen, or Trondheim, hosting your server in Frankfurt or Amsterdam adds 20-40ms of round-trip time (RTT) before the request even hits your web server.

By hosting locally in Norway, you reduce that network latency to single digits. Furthermore, you simplify your GDPR compliance posture by keeping data within the borders, satisfying the strictest interpretations of data residency requirements.

The Final Configuration

Putting it all together, here is a docker-compose.yml snippet (version 2, as it's stable) to get a local Grafana instance up quickly to view your metrics:

version: '2'
services:
  prometheus:
    image: prom/prometheus:v2.3.2
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:5.2.4
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=secret
    ports:
      - "3000:3000"
    depends_on:
      - prometheus

Performance is not an accident. It is a deliberate architecture. You need the right software configuration, the right monitoring tools, and crucially, the right hardware underneath.

Don't let I/O wait kill your SEO rankings. Run that ioping command on your current server today. If the results scare you, it is time to move.

Ready for true single-digit latency? Deploy a high-performance NVMe instance on CoolVDS in Oslo today.