The Map is Not the Territory
If you are relying solely on JavaScript-based trackers like Google Analytics to understand your infrastructure, you are looking at a mirage. They tell you who visited, but they don't tell you who tried to break in, which image hotlinks are draining your bandwidth, or why your HTTPd process is spawning child workers until the server chokes. I learned this the hard way last month while debugging a client's e-commerce portal targeting the Oslo market. Their frontend metrics looked fine, but latency was spiking to 4000ms. The culprit? A massive scraper botnet hammering a non-existent directory. JavaScript tags never fired, so the marketing team saw nothing. But the /var/log/httpd/access_log told the full, ugly story.
Why AWStats Still Rules in 2010
While Webalizer is faster, it lacks detail. AWStats (Advanced Web Statistics) parses your server logs directly to generate graphical reports. It sees everything: status codes, bandwidth usage by file type, and crucially, the user agents hitting your machine. However, parsing a 5GB text file is a heavy operation. It demands high disk I/O throughput. This is where most cheap VPS providers fail—they oversell the spindle speed of their SATA drives.
On a CoolVDS instance, where we prioritize dedicated I/O resources and utilize enterprise-grade RAID arrays, log parsing doesn't send your load average through the roof. You don't want your monitoring tool to be the reason your site goes down.
Step 1: Configuration Hygiene
Before installing AWStats (yum install awstats on CentOS 5 or apt-get install awstats on Debian Lenny), check your Apache configuration. You need the 'Combined' log format to get the referrer and user-agent data. Standard 'Common' logs are useless for security analysis.
# Inside /etc/httpd/conf/httpd.conf
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog logs/access_log combined
Once you verify the logs are rotating correctly (check /etc/logrotate.d/httpd), configure the AWStats domain config file usually found in /etc/awstats/.
Key Settings for Performance and Privacy
Pro Tip: To comply with the strict Norwegian Personal Data Act (Personopplysningsloven) and satisfy the Datatilsynet requirements, consider anonymizing IP addresses if you store logs for extended periods. Set Plugin="geoipfree" to resolve countries without storing raw IP data if strict compliance is required by your legal team.
Ensure DNSLookup=1 is carefully managed. Enabling reverse DNS lookups provides better reports (resolving .no domains vs .com) but can drastically slow down the update process if your server's DNS resolver is sluggish. On CoolVDS, our local resolvers in the datacenter are optimized for this, but if you are hosting elsewhere, consider setting it to 0 and doing offline processing.
Step 2: Securing the Intelligence
The AWStats interface is a goldmine for competitors. It shows exactly which keywords drive traffic and your most popular files. Do not leave this open to the public web. I've seen too many admins leave /awstats/ accessible to the world. Lock it down using Apache's .htaccess and htpasswd.
# In your apache config for the awstats directory
<Directory /usr/share/awstats/wwwroot>
AuthName "Server Stats - Authorized Personnel Only"
AuthType Basic
AuthUserFile /etc/awstats/htpasswd.users
Require valid-user
</Directory>
The Hardware Bottleneck: I/O Wait
Here is the reality of parsing logs: it is a read-intensive sequential operation. If you are on a shared host with 500 other users fighting for the same hard drive head, your log update script (usually a cron job running awstats.pl -update) will stall. You'll see your server's wa (Wait I/O) percentage spike in top.
When we built the architecture for CoolVDS, we moved away from the "stuff as many VMs as possible on a drive" model. We use high-performance RAID-10 SAS arrays and low-latency storage backends. This means when your cron job fires at 04:00 AM to parse yesterday's traffic, it finishes in seconds, not hours. For data-heavy applications, raw disk speed is the single biggest factor in system responsiveness.
Automate and Forget
Finally, automate the update. Add a cron job to run every hour so you have near real-time data.
0 * * * * root /usr/lib/cgi-bin/awstats.pl -config=mydomain.no -update > /dev/null
Don't let your server be a black box. Install AWStats, secure it, and host it on infrastructure that can handle the I/O load without flinching.