Stop Grepping Your Eyes Out: A Sysadmin's Guide to AWStats
There is a certain romance to tail -f /var/log/httpd/access_log. Watching the requests scroll by in real-time gives you a pulse on the server. But when the Marketing Director storms into your office demanding to know why traffic from Bergen dropped 15% last Tuesday, a scrolling terminal window won't cut it. You need visualization, and you need it yesterday.
In the world of 2011, Google Analytics is fine for the front-end guys, but as system administrators, we know it misses things. It misses the bots, the hotlinkers, and the 404 errors that are silently killing your I/O. We need server-side analysis. We need AWStats.
In this guide, we are going to set up AWStats on a standard RHEL/CentOS environment, configure it for accurate Apache parsing, and—crucially for those of us hosting in Norway—tweak it to keep Datatilsynet (The Data Protection Authority) off our backs.
The I/O Bottleneck: Why Your Host Matters
Before we touch the config files, a warning. AWStats is a Perl script that parses massive text files. If you are running a high-traffic site generating gigabytes of logs daily, running an update process is an I/O punisher. On cheap, oversold shared hosting or lower-tier OpenVZ containers, the wa (Wait I/O) metric will spike, and your MySQL database will lock up while the logs are parsing.
This is where architecture matters. At CoolVDS, we utilize Xen virtualization. This ensures that your disk I/O is isolated. When you crunch 10GB of logs, you use your allocated resources, not your neighbor's. If you are serious about data analysis, stop fighting for scraps on a crowded node and get a VPS with dedicated throughput.
Step 1: Installation and Prerequisites
Assuming you are running CentOS 5.6 or the newly released CentOS 6, you'll need the EPEL repository enabled. Once that is done, installation is straightforward:
yum install awstats
This will drop the configuration files into /etc/awstats/. Copy the model file to a new file named after your domain:
cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.yourdomain.no.conf
Step 2: Configuration for Accuracy
Open your new config file in vi. We need to adjust a few critical parameters to match your Apache httpd.conf setup.
Log Format
By default, Apache usually uses the 'Common' log format. You want 'Combined' to capture User Agents and Referrers. Ensure your Apache config has:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
Then, in your awstats.yourdomain.no.conf, set:
LogFormat=1
The Norwegian Privacy Context (Datatilsynet)
Here is where the Nordic nuance comes in. Under the Norwegian Personal Data Act (Personopplysningsloven), IP addresses can be considered personally identifiable information (PII). If you are storing logs indefinitely and generating reports, you are processing PII.
To stay compliant and sleep better at night, consider using the SkipHosts parameter to ignore your own office IP, but also look into the AllowToUpdateStatsFromBrowser setting. More importantly, limit who can see these reports. Secure the AWStats directory in Apache using .htaccess:
AuthType Basic
AuthName "Restricted Access"
AuthUserFile /etc/awstats/.htpasswd
Require valid-user
Step 3: Automation via Cron
Don't run updates manually. Add a cron job to update the statistics every hour. Open your crontab with crontab -e:
0 * * * * /usr/share/awstats/tools/awstats_updateall.pl now > /dev/null 2>&1
Pro Tip: If your log files are rotated bylogrotatenightly, ensure your AWStats update runs before the rotation happens, or use theLogFile="/var/log/httpd/access_log.1"fallback mechanism to parse the archived log. Missing a day of data due to rotation timing is a rookie mistake.
Performance: The "Need for Speed"
Parsing logs is linear. The faster your storage, the faster the parse. While spinning rust (HDDs) is standard, the industry is moving toward Solid State Drives (SSD) for caching layers. At CoolVDS, we are aggressively adopting high-performance SAS and SSD configurations in our RAID arrays to lower latency. When your log analyzer can read at 300MB/s instead of 80MB/s, your server spends less time crunching numbers and more time serving customers.
Conclusion
AWStats remains the gold standard in 2011 for server-side analytics. It doesn't rely on Javascript, it tracks bandwidth (which helps you verify your hosting bills), and it works even when users have blocked cookies. Just remember: log analysis is resource-intensive.
If you are tired of your server choking every time you try to run a report, it might be time to upgrade your infrastructure. Deploy a CoolVDS Xen VPS today and experience the stability of dedicated resources and low-latency storage. Your logs are talking; make sure you have the power to listen.