Stop Grepping Blindly: The Sysadmin’s Guide to AWStats & Log Intelligence
If I see one more junior admin trying to debug a traffic spike by staring at a scrolling tail -f /var/log/httpd/access_log, I might just confiscate their root password. Don't get me wrong—grep and awk are foundational tools. But when you are managing a high-traffic e-commerce site targeting the Norwegian market, raw text won't tell you if your conversion drop is due to a 404 error on the checkout page or a latency issue routing through NIX (Norwegian Internet Exchange).
You need historical trends. You need user agent breakdowns. You need AWStats.
In this guide, we are going to set up AWStats on a standard LAMP stack, tune it so the Perl scripts don't hammer your CPU, and discuss the specific privacy implications of logging user IP addresses here in Norway.
The "I/O Wait" Nightmare
I recently audited a server for a media client in Oslo. They were complaining that their site crawled every night at 04:00. They blamed the backup script. I looked at `top`. The CPU wasn't pegged by backups; it was pegged by awstats.pl trying to parse 4 gigabytes of log files on a single SATA drive. The system was stuck in iowait hell.
Log analysis is heavy on disk reads. If your Virtual Private Server (VPS) is sitting on oversubscribed storage, parsing logs will kill your web server's performance. This is why architecture matters. At CoolVDS, we utilize high-performance RAID-10 SAS and emerging Enterprise SSD tiers, which means high IOPS (Input/Output Operations Per Second). You can crunch a 2GB log file in seconds, not minutes.
Installation: CentOS 5 & Debian Lenny
Let's get this running. I'm assuming you have Apache 2.2 installed.
For CentOS 5 (via EPEL)
rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-3.noarch.rpm
yum install awstats
For Debian 5 (Lenny)
apt-get update
apt-get install awstats
Configuration: The Critical Flags
The default config is garbage for high-performance environments. Open your config file, usually located at /etc/awstats/awstats.yourdomain.conf.
1. DNS Lookup (The Performance Killer)
By default, AWStats might try to reverse resolve every IP address to a hostname. Turn this OFF immediately. It creates massive latency and DNS traffic.
DNSLookup=0
2. Incremental Updates
Do not parse the whole log file every time. Use the history file to only parse new lines.
LogFile="/var/log/httpd/access_log"
LogType=W
LogFormat=1
SiteDomain="www.yourdomain.no"
HostAliases="yourdomain.no www.yourdomain.no 127.0.0.1"
DirData="/var/lib/awstats"
DirCgi="/awstats"
DirIcons="/awstatsicons"
AllowToUpdateStatsFromBrowser=0
WarningMessages=1
SaveDatabase=1
Pro Tip: Never set AllowToUpdateStatsFromBrowser=1 on a public facing server unless you want bots triggering the update script and DoS-ing your site. Run the update via a cron job instead.
Automating with Cron
Set up a cron job to update stats every hour. Edit your crontab with crontab -e:
0 * * * * /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=yourdomain.no -update > /dev/null
This keeps your reports fresh without manual intervention.
Norwegian Privacy Context: Datatilsynet & IP Addresses
Here in Norway, the Data Inspectorate (Datatilsynet) takes the Personopplysningsloven (Personal Data Act) very seriously. An IP address can be considered Personal Identifiable Information (PII). Storing full IP addresses of users indefinitely without consent is a legal grey area that can get you in trouble.
If you don't strictly need the full IP for security auditing, you should mask it. AWStats has a plugin for this, but a better approach is often to handle it at the Apache level or via a post-processing script if you want to be 100% safe.
However, if you are hosting with CoolVDS, your data resides physically in Oslo. This aids significantly with compliance compared to hosting in the US under Safe Harbor, where data sovereignty is becoming a major headache for European CTOs.
Securing the Interface
AWStats reports expose your internal directory structure and traffic patterns. Do not leave this open to the web. Use Apache's .htaccess to restrict access.
# /var/www/awstats/.htaccess
AuthName "Admin Access Only"
AuthType Basic
AuthUserFile /etc/awstats/.htpasswd
require valid-user
Create the password file:
htpasswd -c /etc/awstats/.htpasswd adminuser
Why Infrastructure Matters for Analysis
Log analysis is a resource-intensive task. It is "bursty" by nature. On cheap, oversold hosting (OpenVZ containers crammed onto a single drive), your `awstats.pl` process will fight for disk time with your neighbor's PHP scripts. This causes "CPU Steal" and high latency for your actual website visitors.
We built CoolVDS using Xen virtualization to ensure hard resource isolation. When you run a heavy Perl script to crunch your monthly logs, you get the dedicated RAM and disk throughput you paid for. No noisy neighbors. No excuses.
If you are tired of watching your load average spike every time you try to analyze your traffic, it is time for a serious upgrade.
Need a server that can handle heavy I/O without breaking a sweat? Deploy a CoolVDS Xen instance in Oslo today and stop guessing what your traffic looks like.