Stop Flying Blind: Deep Server Log Analysis with AWStats on Linux
I recently audited a client's server after a catastrophic outage during a modest marketing campaign. The culprit wasn't the traffic itself—it was the monitoring. They were running a heavy log analysis script on an oversold VPS with shared I/O. When the logs hit 2GB, the disk wait time spiked, the web server choked, and the site went dark. It was a classic case of "observer effect": the act of measuring the system killed the system.
If you are running a serious deployment, you cannot rely solely on client-side JavaScript trackers like Google Analytics. They miss 404 errors, hotlinking abuse, and the botnets probing your ports. You need server-side analysis. In 2011, the industry standard for this is AWStats. It is powerful, but if configured poorly on weak hardware, it is a resource hog.
The Truth is in /var/log
Client-side tracking lies. It gets blocked by NoScript, misses mobile browsers with poor JS support, and ignores bandwidth theft. Server logs tell the truth. However, raw Apache logs are unreadable streams of text. AWStats parses these into visual data regarding visitors, duration, and most importantly, HTTP error codes.
Installation on RHEL/CentOS
Assuming you are running CentOS 5.5 or RHEL 6, we will use the RPM Forge repository. Do not compile from source unless you enjoy dependency hell.
# yum --enablerepo=rpmforge install awstats
Once installed, the configuration file is your next stop. Copy the model file to a specific config for your domain:
# cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.yourdomain.com.conf
# vi /etc/awstats/awstats.yourdomain.com.conf
Critical Configuration: Performance & Privacy
Here is where the generic tutorials fail you. They tell you to just turn it on. In a production environment, specifically here in Norway where Datatilsynet (The Data Inspectorate) is vigilant about the Personal Data Act, you need to be careful with IP addresses.
1. The Privacy Flag
You should obfuscate IP addresses if you don't strictly need them for security auditing. This keeps you compliant with European privacy directives.
# In awstats.yourdomain.com.conf
WarningMessages=1
DNSLookup=1
# Plugin for GeoIP is recommended for accuracy over reverse DNS
LoadPlugin="geoip FREE"
2. Log Format and I/O
Ensure your Apache httpd.conf is using the 'Combined' log format. AWStats struggles with custom formats unless you spend hours mapping the fields.
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
The Hidden Cost: Disk I/O
This is the part that kills performance. When awstats.pl runs, it reads massive text files. On a standard shared hosting plan or a budget VPS, your disk I/O is shared with hundreds of other neighbors. If a neighbor decides to compile a kernel while you are parsing logs, your site lags.
This is why we architect CoolVDS differently. We don't oversell our storage subsystems. When you run a disk-intensive operation like parsing a 5GB log file, you need sustained throughput. We use high-performance RAID-10 SAS arrays and Enterprise SSD caching (on select nodes) to ensure your IO wait remains negligible.
Pro Tip: Never run AWStats updates in real-time (as a CGI script). It opens a vector for DoS attacks. Always run it via cron and serve static HTML reports.
Setting the Cron Job
Update your stats hourly, not on every page load.
# crontab -e
0 * * * * /usr/bin/perl /var/www/awstats/awstats.pl -config=yourdomain.com -update > /dev/null
Securing the Output
The AWStats interface reveals sensitive data about your directory structure and backend capabilities. Do not leave this open to the public web. Use an .htaccess file to restrict access to your administration IP range.
# /var/www/awstats/.htaccess
AuthType Basic
AuthName "Server Statistics"
AuthUserFile /etc/awstats/.htpasswd
Require valid-user
Order deny,allow
Deny from all
Allow from 123.123.123.123
Why Infrastructure Matters
Analyzing logs effectively requires a balance of CPU power and disk speed. If you are hosting high-traffic sites targeting Norway, latency to the NIX (Norwegian Internet Exchange) is vital, but so is the internal latency of your server's bus. A VPS Norway solution needs to offer more than just an IP address in Oslo; it needs the hardware muscle to crunch data without stalling your MySQL queries.
If you are tired of your server crawling every time you try to figure out who is visiting your site, it might be time to move away from legacy shared platforms. CoolVDS provides the dedicated resources and ddos protection necessary to run comprehensive analytics tools without fear of self-inflicted downtime.
Don't guess where your traffic is coming from. Measure it. Just make sure your server can handle the truth.