Console Login
Home / Blog / Server Administration / Stop Pager Fatigue: The Definitive Nagios & Munin Survival Guide for Norwegian Sysadmins
Server Administration 10 views

Stop Pager Fatigue: The Definitive Nagios & Munin Survival Guide for Norwegian Sysadmins

@

The 3:00 AM Wake-Up Call

It’s 3:14 AM. Your Nokia buzzes on the nightstand. It's not a text from a friend; it's an SMS gateway telling you the database server is down. You scramble to your laptop, SSH in, and... nothing. The server is fine. The load is 0.2. The site is loading instantly. You just lost two hours of sleep to a false positive caused by a momentary packet drop between your budget hosting provider in Texas and your users in Trondheim.

If you run infrastructure, silence is golden. Noise is the enemy. In the Nordic hosting market, where we value stability and precision, relying on scripts and hope isn't enough. You need the "Hammer and the Scalpel": Nagios for the alerts, and Munin for the diagnosis.

Here is how we set this up at an enterprise level, and why running this on a stable platform like CoolVDS makes the difference between sleeping soundly and nursing a coffee addiction.

The Hammer: Nagios 3

Nagios is the industry standard for a reason. It is ugly, the configuration files are verbose, and it is absolutely bulletproof. It answers one binary question: Is it broken?

On a standard Ubuntu 10.04 LTS (Lucid Lynx) setup, do not compile from source unless you enjoy dependency hell. Use the repositories.

sudo apt-get update sudo apt-get install nagios3 nagios-plugins nagios-nrpe-plugin

The mistake most junior admins make is checking only PING. A server responding to ICMP can still be serving 500 errors to your customers. You need to check the services. Here is a snippet for /etc/nagios3/conf.d/myserver.cfg that checks if MySQL is actually accepting connections, not just if the port is open:

define service { host_name db-01.coolvds.net service_description MySQL check_command check_mysql_cmdlinecred!root!mysecurepassword use generic-service notification_interval 0 ; Send one alert, then shut up }
Pro Tip: Never leave your Nagios web interface open to the world. Use htpasswd. If you are hosting with us, restrict access to your management IP using iptables or CoolVDS hardware firewall rules.

The Scalpel: Munin

Nagios tells you the server crashed. Munin tells you why it crashed. Munin graphs system resources over time using RRDTool. It allows you to see the trend before the cliff.

For example, if Nagios alerts that disk space is critical, that's a panic. If you check Munin and see the disk usage slope has been rising steadily for three weeks at a 45-degree angle, that's poor planning.

To install the node on your client VPS:

sudo apt-get install munin-node sudo vi /etc/munin/munin-node.conf

You must allow your master server to connect. By default, it only allows localhost. Add your monitoring server IP:

allow ^127\.0\.0\.1$ allow ^192\.168\.1\.50$ ; Your Monitoring Server IP

The "Wait IO" Trap

Here is where the underlying infrastructure matters. You can have the best Nagios config in the world, but if your VPS is sitting on a crowded node with 50 other tenants fighting for disk I/O, your monitoring will light up red constantly.

In Munin, look at the CPU usage graph. Specifically, look for the red area labeled iowait. If this is consistently above 20%, your drive heads are trashing.

This is where CoolVDS differs from the budget oversellers. We use Xen virtualization, not OpenVZ containerization for our premium lines. Xen provides better isolation. When your neighbor compiles a kernel, your graph shouldn't spike. Furthermore, we utilize enterprise-grade RAID-10 SAS arrays. While expensive compared to standard SATA, the random I/O performance keeps iowait low, preventing false alerts in Nagios.

Latency: The Norway Factor

If your target audience is in Oslo or Bergen, why are you monitoring them from a server in Frankfurt or Amsterdam? Light travels fast, but network hops add jitter.

Norwegian data laws, specifically the Personopplysningsloven (Personal Data Act), suggest keeping sensitive logs within jurisdiction where possible to appease the Datatilsynet. But purely from a technical standpoint, monitoring your Norwegian infrastructure from a CoolVDS instance in our Oslo datacenter ensures that an alert is a real server issue, not a fiber cut in the North Sea.

  • Ping to NIX (Norwegian Internet Exchange): ~1-2ms from our datacenter.
  • Ping from London: ~15-20ms.

That 15ms difference doesn't sound like much until you are tuning timeout thresholds for high-frequency trading or VoIP applications.

Conclusion

Monitoring is not about collecting data; it's about filtering noise. Nagios wakes you up; Munin lets you fix it quickly so you can go back to sleep.

Don't put your monitoring system on the same physical hardware as your production server. That is the definition of a single point of failure. Spin up a small, dedicated instance. With CoolVDS, you can deploy a rock-solid Xen VPS in minutes.

Ready to secure your uptime? Deploy a CoolVDS instance today and set your thresholds tight.

/// TAGS

/// RELATED POSTS

Surviving the Spike: High-Performance E-commerce Hosting Architecture for 2012

Is your Magento store ready for the holiday rush? We break down the Nginx, Varnish, and SSD tuning s...

Read More →

Automate or Die: Bulletproof Remote Backups with Rsync on CentOS 6

RAID is not a backup. Don't let a typo destroy your database. Learn how to set up automated, increme...

Read More →

Nginx as a Reverse Proxy: Stop Letting Apache Kill Your Server Load

Is your LAMP stack choking on traffic? Learn how to deploy Nginx as a high-performance reverse proxy...

Read More →

Apache vs Lighttpd in 2012: Squeezing Performance from Your Norway VPS

Is Apache's memory bloat killing your server? We benchmark the industry standard against the lightwe...

Read More →

Stop Guessing: Precision Server Monitoring with Munin & Nagios on CentOS 6

Is your server going down at 3 AM? Stop reactive fire-fighting. We detail the exact Nagios and Munin...

Read More →

The Sysadmin’s Guide to Bulletproof Automated Backups (2012 Edition)

RAID 10 is not a backup strategy. In this guide, we cover scripting rsync, rotating MySQL dumps, and...

Read More →
← Back to All Posts