Stop Flying Blind: Advanced Log Analysis with AWStats on Linux VDS
Most System Administrators I talk to are running blind. They have httpd throwing gigabytes of data into /var/log/, yet they have no idea who is actually hitting their server until the load average spikes to 20 and the pager starts buzzing. If you are still grepping raw text files to debug traffic spikes, you are wasting time.
We need structure. We need visualization. In 2009, AWStats remains the gold standard for server-side log analysis. Unlike client-side Javascript trackers (like Google Analytics), AWStats parses the actual server logs. It sees the bots, the hotlinkers, and the errors that Javascript tags miss.
But parsing 5GB of logs nightly requires serious I/O performance. Here is how to set it up correctly without killing your server's disk performance.
The Architecture of Analysis
I recently audited a media client in Oslo struggling with "unexplained" slowdowns every morning at 04:00. The culprit? A poorly configured log rotation script triggering a massive AWStats update process on a cheap, oversold VPS. The CPU steal time was through the roof because the host node couldn't handle the disk reads.
To avoid this, we need efficient configuration and hardware that doesn't lie about dedicated resources.
1. Installation on CentOS 5 / RHEL
Don't compile from source unless you have very specific patch requirements. Use the RPM Forge repository to keep it manageable.
rpm -Uhv http://apt.sw.be/redhat/el5/en/i386/rpmforge/RPMS/rpmforge-release-0.3.6-1.el5.rf.i386.rpm
yum install awstats
2. Critical Configuration: The I/O Bottleneck
The default configuration is safe, not fast. Open /etc/awstats/awstats.yourdomain.conf. The most expensive operation in log analysis is reverse DNS lookups (resolving IPs to hostnames). If your site gets heavy traffic, this will choke your network stack and delay report generation.
The Fix: Disable DNS lookups for the parsing phase. You can rely on the GeoIP plugin for country data instead.
DNSLookup=0
LoadPlugin="geoipfree"
Ensure you have the Perl Geo::IPfree module installed. This keeps the processing local and CPU-bound rather than network-bound.
Privacy and The "Datatilsynet" Factor
Hosting in Norway means adhering to strict privacy standards. Under the Personopplysningsloven, IP addresses can be considered personal data. The Norwegian Data Protection Authority (Datatilsynet) takes a dim view of hoarding user data indefinitely without purpose.
When configuring your Apache or Nginx rotation, ensure you aren't keeping raw logs longer than necessary. In your AWStats config, consider who has access to the output. Do not leave the /awstats/ directory open to the public web. Secure it with an .htaccess file immediately:
AuthType Basic
AuthName "Internal Access Only"
AuthUserFile /usr/local/apache/passwd/passwords
Require user admin
The Hardware Reality: OpenVZ vs. Xen
This is where your choice of hosting provider impacts your sleep schedule. Log parsing is I/O intensive. It reads massive files and writes thousands of small stats files.
On budget OpenVZ containers, you share the disk I/O queue with hundreds of other "noisy neighbors." If another user decides to compile a kernel or run a backup while your AWStats is running, your process hangs. This is why "guaranteed RAM" isn't enough; you need guaranteed disk throughput.
Pro Tip: Check your disk latency during a log parse. Run `iostat -x 1`. If `%util` hits 100% and `await` exceeds 20ms, your storage subsystem is the bottleneck.
This is why we built CoolVDS on Xen virtualization with hardware RAID-10 arrays. Xen ensures tight isolation. When you need to parse a 2GB log file, you get the dedicated spindle speed you paid for, not the leftovers from the teenager hosting a game server next door.
Automating the Update
Don't rely on the CGI script to update stats; it's slow and times out. Run the update via cron, ideally before business hours but after log rotation.
# /etc/cron.d/awstats
10 03 * * * root /usr/bin/awstats_updateall.pl now -awstatsprog=/usr/bin/awstats -q
This command updates all config files found in the directory sequentially. By running it at 03:10, you avoid the midnight cron rush that cripples most shared hosting environments.
Final Thoughts
Data is useless if you can't process it. AWStats gives you the visibility you need to optimize your LAMP stack, but it requires a foundation that can handle the heavy lifting. Don't let disk I/O wait times kill your productivity.
If you need a server that crunches logs as fast as it serves pages, check out our CoolVDS Business Plans. We utilize 15k RPM SAS drives in RAID-10 to ensure your I/O remains consistent, even during peak load.