Stop Guessing: Accurate Server Log Analysis with AWStats
If you rely solely on JavaScript-based trackers like Google Analytics, you are flying blind. Between the rise of NoScript, the privacy-conscious implementation of Do Not Track in Firefox, and the increasing number of corporate firewalls blocking tracking beacons, you are likely missing 10% to 20% of your actual traffic. As a systems administrator, the truth isn't in a third-party dashboard; it is in your /var/log directory.
But raw logs are ugly. Scanning a 4GB text file with grep or awk is fine for troubleshooting a specific 500 error, but it offers zero insight into trends. This is where AWStats (Advanced Web Statistics) comes in. It is an old reliable tool—a Perl-based log analyzer that parses your Apache or Nginx logs to generate comprehensive graphical reports.
In this guide, we will set up AWStats on a standard LAMP stack (CentOS 6 or Debian 6), configure it for high-traffic log rotation, and address the specific data retention requirements enforced by Datatilsynet here in Norway.
The I/O Bottleneck: Why Hardware Matters
Before we touch the config files, let’s talk about hardware. Log analysis is brutally I/O intensive. When AWStats parses a month's worth of logs, it is performing millions of read operations. On traditional 7.2k RPM SATA drives, this process can spike your I/O wait (iowait) percentage, causing your web server to stutter or lock up entirely during the update process.
Pro Tip: Never run heavy log analysis on the same physical disk spindle as your database. If you are hosting on a budget VPS provider that oversells resources, running a cron job for AWStats at 04:00 AM might actually take your MySQL server offline due to I/O starvation.
This is why we architect CoolVDS differently. By utilizing pure SSD storage and KVM virtualization, we eliminate the "noisy neighbor" effect. The random read speeds on our storage arrays mean you can crunch through 2GB of access logs in seconds, not minutes, without degrading the performance of your Apache httpd processes. Speed isn't just a luxury; it's a stability requirement.
Step 1: Installation
For CentOS 6 / RHEL 6, AWStats is not in the base repository. You need the EPEL (Extra Packages for Enterprise Linux) repo.
rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-5.noarch.rpm
yum install awstats
For Debian 6 (Squeeze) or Ubuntu 10.04 LTS, it is straightforward:
apt-get update
apt-get install awstats
Step 2: Configuring Apache for Accuracy
By default, Apache's common log format doesn't give us enough data. We need the combined format to track User-Agents and Referrers. Check your httpd.conf or apache2.conf:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog /var/log/httpd/access_log combined
If you are running Nginx (which is rapidly becoming the standard for high-performance static serving), your log format in nginx.conf should look like this:
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
Step 3: The AWStats Configuration
Copy the model configuration file to a new file specifically for your domain. We will assume your site is www.example.no.
cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.www.example.no.conf
vi /etc/awstats/awstats.www.example.no.conf
You need to change these critical parameters:
- LogFile: Point this to your active server log. E.g.,
/var/log/httpd/access_log. - LogType: Set to
Wfor web log. - LogFormat: Set to
1for Apache combined / Nginx main. - SiteDomain:
www.example.no - HostAliases:
example.no www.example.no localhost 127.0.0.1
Handling Rotated Logs
Here is where most tutorials fail. Linux servers rotate logs via logrotate (usually daily or weekly), creating files like access_log.1 or access_log.2.gz. If AWStats only runs once a day, and the log rotates before AWStats runs, you lose data.
To fix this, we use a specific LogFile command that parses the archived log first if necessary, or we simply run the update script right before rotation. A better approach for high-traffic sites hosted on CoolVDS is to process the log incrementally. Ensure AllowToUpdateStatsFromBrowser=0 to prevent external users from triggering a resource-heavy Perl process via HTTP.
Step 4: Privacy and Norwegian Law (Personopplysningsloven)
Hosting in Norway means adhering to strict privacy standards. Datatilsynet (The Norwegian Data Protection Authority) views IP addresses as personal data. If you do not have a specific need to log full IPs for security auditing, you should anonymize them in your analytics reports.
AWStats has a plugin for this. Inside your configuration file, enable the hashfiles plugin or use a custom Perl hook to mask the last octet of the IP address. While not enabled by default, ensuring user privacy builds trust.
Furthermore, to get accurate country data without violating privacy, use the GeoIP plugin (requires the Geo::IP Perl module and the MaxMind GeoIP.dat database).
# /etc/awstats/awstats.www.example.no.conf
LoadPlugin="geoip GEOIP_STANDARD /usr/share/GeoIP/GeoIP.dat"
Step 5: Automation via Cron
Do not run updates manually. Add a cron job to update the statistics every hour. Open your crontab with crontab -e:
# Update AWStats every hour at minute 0
0 * * * * /usr/bin/perl /usr/lib/cgi-bin/awstats.pl -config=www.example.no -update > /dev/null
This script parses the new lines in the log file since the last run. Because CoolVDS instances provide dedicated RAM and fast CPU cycles, this Perl script execution usually completes in milliseconds, causing zero latency for your site visitors.
Conclusion
Server-side logging is the only way to get a 100% accurate picture of your infrastructure's health and traffic. By using AWStats, you retain ownership of your data—crucial for compliance in the European Economic Area—and you gain visibility into bots, hotlinking, and bandwidth theft that JavaScript trackers simply cannot see.
However, log analysis requires resources. If your current host throttles your CPU or caps your disk I/O, your analytics will suffer. You need a platform built for heavy lifting.
Ready for infrastructure that keeps up with your traffic? Deploy a CoolVDS instance with SSD storage today and experience the difference low latency makes. Configure your server now.