Console Login
Home / Blog / Server Administration / Stop Grepping Your Logs: Implementing AWStats for High-Traffic Norwegian Servers
Server Administration 8 views

Stop Grepping Your Logs: Implementing AWStats for High-Traffic Norwegian Servers

@

Stop Grepping Your Eyes Out: A Sysadmin's Guide to AWStats

There is a certain romance to tail -f /var/log/httpd/access_log. Watching the requests scroll by in real-time gives you a pulse on the server. But when the Marketing Director storms into your office demanding to know why traffic from Bergen dropped 15% last Tuesday, a scrolling terminal window won't cut it. You need visualization, and you need it yesterday.

In the world of 2011, Google Analytics is fine for the front-end guys, but as system administrators, we know it misses things. It misses the bots, the hotlinkers, and the 404 errors that are silently killing your I/O. We need server-side analysis. We need AWStats.

In this guide, we are going to set up AWStats on a standard RHEL/CentOS environment, configure it for accurate Apache parsing, and—crucially for those of us hosting in Norway—tweak it to keep Datatilsynet (The Data Protection Authority) off our backs.

The I/O Bottleneck: Why Your Host Matters

Before we touch the config files, a warning. AWStats is a Perl script that parses massive text files. If you are running a high-traffic site generating gigabytes of logs daily, running an update process is an I/O punisher. On cheap, oversold shared hosting or lower-tier OpenVZ containers, the wa (Wait I/O) metric will spike, and your MySQL database will lock up while the logs are parsing.

This is where architecture matters. At CoolVDS, we utilize Xen virtualization. This ensures that your disk I/O is isolated. When you crunch 10GB of logs, you use your allocated resources, not your neighbor's. If you are serious about data analysis, stop fighting for scraps on a crowded node and get a VPS with dedicated throughput.

Step 1: Installation and Prerequisites

Assuming you are running CentOS 5.6 or the newly released CentOS 6, you'll need the EPEL repository enabled. Once that is done, installation is straightforward:

yum install awstats

This will drop the configuration files into /etc/awstats/. Copy the model file to a new file named after your domain:

cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.yourdomain.no.conf

Step 2: Configuration for Accuracy

Open your new config file in vi. We need to adjust a few critical parameters to match your Apache httpd.conf setup.

Log Format

By default, Apache usually uses the 'Common' log format. You want 'Combined' to capture User Agents and Referrers. Ensure your Apache config has:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

Then, in your awstats.yourdomain.no.conf, set:

LogFormat=1

The Norwegian Privacy Context (Datatilsynet)

Here is where the Nordic nuance comes in. Under the Norwegian Personal Data Act (Personopplysningsloven), IP addresses can be considered personally identifiable information (PII). If you are storing logs indefinitely and generating reports, you are processing PII.

To stay compliant and sleep better at night, consider using the SkipHosts parameter to ignore your own office IP, but also look into the AllowToUpdateStatsFromBrowser setting. More importantly, limit who can see these reports. Secure the AWStats directory in Apache using .htaccess:

AuthType Basic AuthName "Restricted Access" AuthUserFile /etc/awstats/.htpasswd Require valid-user

Step 3: Automation via Cron

Don't run updates manually. Add a cron job to update the statistics every hour. Open your crontab with crontab -e:

0 * * * * /usr/share/awstats/tools/awstats_updateall.pl now > /dev/null 2>&1
Pro Tip: If your log files are rotated by logrotate nightly, ensure your AWStats update runs before the rotation happens, or use the LogFile="/var/log/httpd/access_log.1" fallback mechanism to parse the archived log. Missing a day of data due to rotation timing is a rookie mistake.

Performance: The "Need for Speed"

Parsing logs is linear. The faster your storage, the faster the parse. While spinning rust (HDDs) is standard, the industry is moving toward Solid State Drives (SSD) for caching layers. At CoolVDS, we are aggressively adopting high-performance SAS and SSD configurations in our RAID arrays to lower latency. When your log analyzer can read at 300MB/s instead of 80MB/s, your server spends less time crunching numbers and more time serving customers.

Conclusion

AWStats remains the gold standard in 2011 for server-side analytics. It doesn't rely on Javascript, it tracks bandwidth (which helps you verify your hosting bills), and it works even when users have blocked cookies. Just remember: log analysis is resource-intensive.

If you are tired of your server choking every time you try to run a report, it might be time to upgrade your infrastructure. Deploy a CoolVDS Xen VPS today and experience the stability of dedicated resources and low-latency storage. Your logs are talking; make sure you have the power to listen.

/// TAGS

/// RELATED POSTS

Surviving the Spike: High-Performance E-commerce Hosting Architecture for 2012

Is your Magento store ready for the holiday rush? We break down the Nginx, Varnish, and SSD tuning s...

Read More →

Automate or Die: Bulletproof Remote Backups with Rsync on CentOS 6

RAID is not a backup. Don't let a typo destroy your database. Learn how to set up automated, increme...

Read More →

Nginx as a Reverse Proxy: Stop Letting Apache Kill Your Server Load

Is your LAMP stack choking on traffic? Learn how to deploy Nginx as a high-performance reverse proxy...

Read More →

Apache vs Lighttpd in 2012: Squeezing Performance from Your Norway VPS

Is Apache's memory bloat killing your server? We benchmark the industry standard against the lightwe...

Read More →

Stop Guessing: Precision Server Monitoring with Munin & Nagios on CentOS 6

Is your server going down at 3 AM? Stop reactive fire-fighting. We detail the exact Nagios and Munin...

Read More →

The Sysadmin’s Guide to Bulletproof Automated Backups (2012 Edition)

RAID 10 is not a backup strategy. In this guide, we cover scripting rsync, rotating MySQL dumps, and...

Read More →
← Back to All Posts