Surviving the 100ms Cap: Building a GDPR-Ready APM Stack in Norway
If your application takes longer than 100 milliseconds to respond, you are already losing users. Amazon proved years ago that every 100ms of latency costs 1% in sales. In 2018, user patience is even thinner. But right now, we have a bigger problem than just speed: May 25th, 2018.
With the General Data Protection Regulation (GDPR) enforcement date approaching, the standard DevOps practice of piping all your logs and metrics to a US-based SaaS provider is becoming a legal minefield. If your application logs contain IP addresses or User IDs, sending that data across the Atlantic requires complex Data Processing Agreements that many legal teams are currently rejecting.
The solution isn't to stop monitoring. It's to bring the monitoring home. By self-hosting your Application Performance Monitoring (APM) stack in Norway, you satisfy the Datatilsynet (Norwegian Data Protection Authority) requirements and, ironically, you often get better performance due to reduced network latency.
The Architecture: Prometheus & Grafana
For years, Nagios was the standard. It worked, but it was ugly and focused on "is it up?" rather than "how does it feel?". Today, the industry standard for time-series metrics is Prometheus coupled with Grafana for visualization. Unlike the ELK stack (Elasticsearch, Logstash, Kibana), which is heavy on memory and designed for text logs, Prometheus is designed for pure numeric metrics.
This efficiency matters. When you are sampling thousands of requests per second, write throughput becomes your bottleneck.
Step 1: The Infrastructure Layer
You cannot run a high-resolution Time Series Database (TSDB) on standard SATA spinning rust. Prometheus relies heavily on disk I/O to persist chunks of data. If your disk wait times (iowait) spike, your monitoring lags, and you lose visibility exactly when you need it most—during a traffic surge.
Pro Tip: Always verify your disk technology. We recently migrated a client from a legacy host to CoolVDS specifically for the NVMe backing. The ingestion rate for Prometheus jumped from 45k samples/sec to over 120k samples/sec purely due to the storage upgrade.
Step 2: Deploying the Stack with Docker
We will use Docker Compose (v3 schema) to orchestrate this. It ensures environment reproducibility. Make sure you have Docker 17.09 or later installed.
First, verify your Docker installation:
docker --version && docker-compose --version
Create a file named docker-compose.yml. We are mapping the storage to the host to ensure data persistence if the container crashes.
version: '3'
services:
prometheus:
image: prom/prometheus:v2.1.0
container_name: prometheus
volumes:
- ./prometheus/:/etc/prometheus/
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
ports:
- 9090:9090
restart: always
node-exporter:
image: prom/node-exporter:v0.15.2
container_name: node-exporter
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- --collector.filesystem.ignored-mount-points
- "^/(sys|proc|dev|host|etc|rootfs/var/lib/docker/containers|rootfs/var/lib/docker/overlay2|rootfs/run/docker/netns|rootfs/var/lib/docker/aufs)($|/)"
ports:
- 9100:9100
restart: always
grafana:
image: grafana/grafana:5.0.0
container_name: grafana
depends_on:
- prometheus
ports:
- 3000:3000
volumes:
- grafana_data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=SecretPassword123
restart: always
volumes:
prometheus_data: {}
grafana_data: {}
Step 3: Configuring Prometheus
Prometheus needs to know what to scrape. Unlike push-based systems (like New Relic), Prometheus pulls data. This is better for security; you don't need to open outbound ports on your production servers, you just whitelist the monitoring server's IP.
Create prometheus/prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node_exporter'
static_configs:
- targets: ['node-exporter:9100']
# Example of monitoring an external web server
- job_name: 'production_web_01'
static_configs:
- targets: ['10.0.0.5:9100']
Exposing Application Metrics
System metrics (CPU, RAM) are useful, but they don't tell you if your users are happy. For that, you need application metrics. If you are running Nginx, you must enable the stub_status module. This gives Prometheus data on active connections and dropped requests.
Check if your Nginx has the module compiled:
nginx -V 2>&1 | grep -o with-http_stub_status_module
Now, add this location block to your Nginx configuration. Crucial: Only allow access from your local IP or the monitoring server IP (e.g., your CoolVDS private network IP).
server {
listen 80;
server_name localhost;
location /stub_status {
stub_status on;
access_log off;
allow 127.0.0.1;
allow 10.0.0.0/24; # Allow private network
deny all;
}
}
Reload Nginx to apply changes:
service nginx reload
You can test the output immediately with curl:
curl http://127.0.0.1/stub_status
The Hidden Bottleneck: Kernel Tuning
When you start collecting metrics from hundreds of containers or services, you will hit Linux kernel limits on open file descriptors. Default installations often cap this at 1024, which is laughably low for a monitoring node.
On your monitoring VPS, check the current limit:
ulimit -n
To fix this permanently, edit /etc/sysctl.conf. We also need to adjust swappiness. If your monitoring database touches swap, performance falls off a cliff. On a KVM-based system like CoolVDS, we want the kernel to avoid swap at all costs unless OOM (Out of Memory) is imminent.
# /etc/sysctl.conf additions
fs.file-max = 2097152
vm.swappiness = 1
vm.dirty_ratio = 80
vm.dirty_background_ratio = 5
Apply these settings without rebooting:
sysctl -p
Why Local Hosting Matters for Latency
Beyond GDPR, there is the physics of networking. If your customers are in Oslo, Bergen, or Trondheim, routing your traffic through a data center in Frankfurt or Amsterdam adds 20-40ms of round-trip time (RTT). If you route to US East, add 90ms+.
When you monitor locally using a provider connected to NIX (Norwegian Internet Exchange), you are often seeing the "real" latency your users experience, not an artificial number inflated by trans-Atlantic hops.
CoolVDS offers distinct advantages here. Because we utilize KVM (Kernel-based Virtual Machine) virtualization, your resources are hard-allocated. In shared hosting or container-based VPS (like OpenVZ), a "neighbor" abusing their CPU can cause micro-stutters in your database. KVM prevents this steal time. When you are debugging a 500ms latency spike, you need to know it's your code, not your neighbor's bitcoin miner.
Next Steps
GDPR is coming in less than three months. The time to audit your data flows is now. Moving your monitoring stack to a Norwegian jurisdiction solves the compliance headache and improves your visibility into local network performance.
Don't let I/O wait times blind you. Deploy a high-performance NVMe instance on CoolVDS today and start graphing your metrics in real-time.