Console Login

Stop Grepping Blindly: Master Server Log Analysis with AWStats on CentOS

Stop Grepping Blindly: Master Server Log Analysis with AWStats

There is a certain romanticism to watching tail -f /var/log/httpd/access_log fly by on a terminal screen. It looks busy. It looks like work. But let's be honest: scrolling text doesn't tell you why your bandwidth usage spiked at 3:00 AM, nor does it identify which botnet is currently hammering your login page.

In the Norwegian hosting market, where bandwidth costs can be significant and latency to the end-user is everything, ignorance is expensive. You need structured data. You need AWStats.

I’ve seen too many systems administrators rely on basic Webalizer installs that come default with cPanel, missing the granular data required to optimize a high-traffic site. Here is how to set up AWStats correctly on a CentOS 5 environment, and why the underlying hardware of your Virtual Private Server (VPS) matters more than you think.

The Prerequisites

Before we touch the config, ensure you are running a clean AMP stack. For this guide, we assume:

  • CentOS 5.4 (Final)
  • Apache 2.2.x
  • Perl 5.8.8+
  • Root access (SuPHP is fine, but we need fast CGI or mod_perl for speed)

Step 1: Installation and The "Combined" Log Format

First, grab the RPM or source. Using yum with the EPEL repository is usually the fastest path to stability.

yum install awstats

The most common mistake I see in auditing servers is Apache logging in 'Common' format. AWStats needs 'Combined' format to track user agents and referrers. Check your httpd.conf:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined CustomLog logs/access_log combined

If you change this, reload Apache immediately. Without the referrer data, you can't track which search engine sent you that traffic spike.

Step 2: Configuring the Analyzer

Copy the model config file to match your domain:

cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.yourdomain.no.conf

Open it with vi and edit these critical lines:

LogFile="/var/log/httpd/access_log" SiteDomain="yourdomain.no" DNSLookup=1
Pro Tip: DNSLookup=1 resolves IP addresses to hostnames (e.g., mapping an IP to a Telenor DSL line). However, this is extremely slow and can hang the analysis process. On standard shared hosting, this times out. On CoolVDS, our local DNS resolvers in Oslo are optimized for this, but generally, I recommend setting this to 0 and using a static GeoIP plugin for country detection instead.

The Hardware Bottleneck: I/O Wait

Here is the reality no one talks about. Parsing a 4GB log file is disk-intensive. It requires reading millions of lines and writing thousands of small statistical files.

If you are on a budget VPS with oversold hard drives, running awstats_updateall.pl will spike your I/O Wait (wa). When I/O Wait goes up, Apache stops serving requests because the disk heads are busy reading logs. Your site slows down just because you wanted to see who visited it.

This is why we built CoolVDS with enterprise-grade SAS 15k RPM drives in RAID-10 arrays. We strip the data across multiple spindles to ensure that disk reads for log analysis never interrupt the disk reads for your database or static content.

Compliance: The Norwegian Context

Operating in Norway means adhering to the Personal Data Act (Personopplysningsloven). IP addresses can be considered personal data. If you are hosting on servers located in the US (like EC2 or Rackspace), you are transferring this data out of the EEA, which complicates compliance.

By keeping your hosting on CoolVDS infrastructure in Oslo, your logs remain within Norwegian legal jurisdiction, simplifying your standing with Datatilsynet. We don't just provide raw compute; we provide sovereignty.

Automation

Finally, don't run updates manually. Add a cron job to `/etc/cron.hourly/` to keep your stats fresh without hammering the server at midnight:

0 * * * * root /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=yourdomain.no -update > /dev/null

Conclusion

Log analysis is the pulse of your infrastructure. It tells you about broken links (404s), server errors (500s), and bandwidth theft. But it requires a platform that can handle the I/O load.

Don't let your monitoring tools kill your uptime. Deploy your next project on a CoolVDS platform where dedicated RAM and RAID-10 storage come standard.