Console Login

Stop Using MySQL for Metrics: High-Velocity Monitoring with InfluxDB 0.13

Stop Using MySQL for Metrics: High-Velocity Monitoring with InfluxDB 0.13

I still see it in 2016. Senior engineers, who should know better, dumping server load metrics, IoT sensor readings, and application logs into a massive MySQL InnoDB table. It works for the first week. Then, as your rows hit the millions, your index size explodes, RAM usage spikes, and your dashboard queries start taking 30 seconds to render. The "solution" is usually a cron job to `DELETE` old rows, which locks the table and kills your application performance.

It is time to stop. We are in the era of high-cardinality metrics. We need tools built for write-heavy workloads where updates are rare, deletes are bulk, and time is the primary axis. Enter InfluxDB.

The Time-Series Problem

Time-series data is unique. It comes in like a firehose—thousands of points per second—and you almost never update existing records. You just write, write, write. Traditional B-Tree indexes found in Postgres or MySQL are not optimized for this. As the tree grows, rebalancing becomes expensive.

InfluxDB (currently on version 0.13 and rapidly approaching 1.0) uses a Time Structured Merge Tree (TSM). TSM is designed to handle high ingest rates and immediately compress data, reducing storage footprint by massive margins compared to Whisper (Graphite) or generic JSON stores.

Pro Tip: Never host your monitoring stack on the same physical hardware as your production app without strict resource isolation. If your app spikes, it kills the monitoring just when you need it most. This is why we recommend dedicated KVM instances on CoolVDS for your TIG (Telegraf, InfluxDB, Grafana) stack.

Installing InfluxDB 0.13 on Ubuntu 16.04 LTS

Let's get our hands dirty. We assume you are running a fresh CoolVDS instance with Ubuntu 16.04 (Xenial). Do not use the default repos; they are often outdated. We will use the official InfluxData repositories.

# Add the InfluxData GPG key
curl -sL https://repos.influxdata.com/influxdb.key | sudo apt-key add -

# Add the repository source
source /etc/lsb-release
echo "deb https://repos.influxdata.com/${DISTRIB_ID,,} ${DISTRIB_CODENAME} stable" | sudo tee /etc/apt/sources.list.d/influxdb.list

# Update and Install
sudo apt-get update
sudo apt-get install influxdb

# Start the service (Systemd is standard in 16.04)
sudo systemctl start influxdb
sudo systemctl status influxdb

Once running, verify it is listening on port 8086 (HTTP API) and 8088 (RPC):

netstat -plnt | grep influxd

Configuring for NVMe Throughput

Here is the reality check: Disk I/O is the bottleneck. InfluxDB writes heavily to the Write Ahead Log (WAL) before flushing to the TSM files. If you are running this on standard magnetic storage or shared SATA SSDs with noisy neighbors, your write latency will skyrocket, and points will be dropped.

At CoolVDS, we backed our Oslo datacenter with pure NVMe storage specifically for this reason. When you are pushing 50,000 writes per second, standard SSDs cannot keep up with the fsync requirements.

Optimize your /etc/influxdb/influxdb.conf for a high-performance environment:

[data]
  # The directory where the TSM engine stores TSM files.
  dir = "/var/lib/influxdb/data"

  # The directory where the TSM engine stores WAL files.
  wal-dir = "/var/lib/influxdb/wal"

  # Trace logging provides more details on errors. 
  # Keep false for production to save I/O.
  trace-logging-enabled = false

  # Cache snapshot memory size. 
  # Increase this if you have plenty of RAM (e.g., 16GB+ on CoolVDS Pro instances)
  cache-snapshot-memory-size = "25m"

  # Compact full write groups helps with compression but uses CPU.
  compact-full-write-cold-duration = "4h"

Writing Data: The Line Protocol

InfluxDB uses a text-based format called Line Protocol. It is simple: measurement,tags values timestamp. Tags are indexed; values are not. This distinction is critical for query speed.

Here is how you manually push a metric representing a server load in Oslo, using curl:

curl -i -XPOST 'http://localhost:8086/write?db=mydb' --data-binary 'cpu_load_short,host=server01,region=oslo_nix value=0.64 1466064000000000000'

In a real scenario, you wouldn't use curl. You would use Telegraf, the collector agent that plugs directly into InfluxDB. However, understanding the raw protocol helps when debugging weird data issues.

Querying: SQL-like, but different

The query language (InfluxQL) looks like SQL, which makes adoption easy for devs migrating from MySQL. But beware: it doesn't support joins. You perform aggregations over time windows.

-- Select the mean cpu usage grouped by 10-minute intervals
SELECT mean("value") 
FROM "cpu_load_short" 
WHERE "region" = 'oslo_nix' 
AND time > now() - 6h 
GROUP BY time(10m)

The Norwegian Context: Latency and Sovereignty

Why host this in Norway? Two reasons: Latency and Legislation.

If your infrastructure is connected to NIX (Norwegian Internet Exchange), sending metrics to a cloud provider in Frankfurt or Virginia introduces unnecessary latency. While metrics are asynchronous, monitoring dashboards (Grafana) need to be snappy. A 40ms round-trip to Oslo is superior to 150ms to the US.

Furthermore, with the recent uncertainty around Safe Harbor and the looming EU data protection discussions (GDPR is on the horizon), keeping your system logs—which often contain IP addresses and user metadata—within Norwegian jurisdiction is a prudent move for the pragmatic CTO. CoolVDS ensures your data stays on local hardware, compliant with Datatilsynet guidelines.

Troubleshooting Common Issues

Symptom Likely Cause Solution
"Too many open files" error OS limit reached by TSM engine Increase ulimit in /etc/security/limits.conf to 65536.
High CPU Steal Noisy neighbors on shared hosting Migrate to KVM-based VPS (like CoolVDS) where CPU is guaranteed.
Slow queries on large ranges High Series Cardinality Avoid using unique IDs (like request_id) as tags. Use fields instead.

Conclusion

InfluxDB 0.13 is a powerful beast, but it demands respect regarding hardware resources. It solves the scalability issues of storing time-series data in MySQL, but it introduces a need for high-speed storage subsystems.

Don't let storage I/O become the graveyard of your monitoring strategy. Deploy your time-series stack on infrastructure that can handle the write pressure.

Ready to benchmark InfluxDB performance? Spin up a CoolVDS NVMe instance in Oslo today and see the difference raw I/O makes.