Stop Flying Blind: Mastering Server Log Analysis with AWStats
There is a specific kind of arrogance in the systems administration world that suggests if you aren't reading raw logs via tail -f in a terminal, you aren't doing your job. Iâve seen it in data centers from Oslo to Frankfurt. Itâs nonsense. Watching lines scroll by at 50 lines per second is not "monitoring"âit's The Matrix, and unless you are Neo, you are missing the patterns.
Google Analytics is fine for marketing teams, but it lies to you. It misses the bots, the scrapers, the hotlinkers, and the 404 errors that are silently killing your SEO. To see what is actually hitting your metal, you need server-side log analysis. In 2012, the battle-tested, undisputed king of this domain is still AWStats.
But here is the catch: Log parsing is an I/O killer. If you run this on a cheap, oversold OpenVZ container, your disk wait times will spike, your database will lock up, and your site will crawl. This guide assumes you are running on decent hardwareâlike the Xen/KVM virtualization stacks we use at CoolVDSâwhere dedicated resources prevent the "noisy neighbor" effect during heavy parse jobs.
Step 1: The Pre-Flight Check (CentOS 6 / RHEL 6)
We are focusing on CentOS 6.2 (released December 2011) as it is the current standard for enterprise stability. First, ensure your repository lists are updated. AWStats is available in the EPEL (Extra Packages for Enterprise Linux) repository. If you don't have EPEL, fetch it.
rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-5.noarch.rpm
yum update
yum install awstats
This installs AWStats 7.0 (or the latest 7.x stable build). Itâs Perl-based, so itâs going to chew through some CPU cycles during updates. This is normal.
Step 2: Configuration Architecture
AWStats doesn't magically know where your logs are. You need to create a configuration file for your specific domain. The default configs live in /etc/awstats/.
cd /etc/awstats
cp awstats.model.conf awstats.www.yourdomain.com.conf
vi awstats.www.yourdomain.com.conf
Now, we need to edit the "Meat" of the config. Pay close attention to the LogFile path. If you are running a standard Apache setup, it usually sits in /var/log/httpd/. If you are one of the early adopters moving to Nginx 1.0.x, your paths will differ.
Crucial Configuration Flags
Find and modify these lines to match your environment:
# Point this to your actual access log
LogFile="/var/log/httpd/access_log"
# Ensure this matches the type of server (W for Web)
LogType=W
# The log format. 1 is "Combined" (Apache/Nginx standard).
# If you use a custom Nginx log_format, you must map it here manually.
LogFormat=1
# Your domain
SiteDomain="www.yourdomain.com"
HostAliases="yourdomain.com localhost 127.0.0.1"
# DNS Lookup. WARNING: Setting this to 1 kills performance.
# It forces a reverse DNS lookup for every IP.
# Keep it at 0 unless you have a CoolVDS dedicated instance with low latency to DNS resolvers.
DNSLookup=0
Pro Tip: Never run DNSLookup=1 on a production web server during peak hours. The latency of waiting for DNS timeouts will cause your analysis process to hang for hours. If you need hostname resolution, run the update job offline or on a secondary log server.
Step 3: Handling Datatilsynet & Privacy (Norwegian Context)
In Norway, the Personopplysningsloven (Personal Data Act) is strict. While IP addresses are technically necessary for server operation, storing them indefinitely in statistical reports can be a gray area depending on how you interpret the Data Inspectorate's (Datatilsynet) guidelines on personally identifiable information.
If your legal department is nervous, use the AWStats plugin system to hash or mask IPs. While total anonymization makes tracking unique visitors harder, compliance is cheaper than a lawsuit.
Uncomment this line in your config to enable the GeoIP plugin (requires Geo::IP Perl module and MaxMind database), which gives you location data without needing the raw IP visible in the final HTML report:
LoadPlugin="geoipfree"
Step 4: Automation and The I/O Trap
You cannot run the update manually every day. You need a cron job. However, this is where most sysadmins fail. They schedule the log parsing at midnightâexactly when their backups start running.
Do not do this.
Parsing a 2GB log file is read-intensive. Backups are read-intensive. Doing both simultaneously will trash your disk I/O, causing high iowait and sluggish website performance.
The Intelligent Cron Schedule
Schedule the update for an off-peak hour, separated from your backup window. Here is a robust crontab entry:
# Update AWStats every 6 hours, not at the top of the hour to avoid collision with other system tasks
30 0,6,12,18 * * * /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=www.yourdomain.com -update > /dev/null
On a CoolVDS instance, we utilize RAID-10 SAS or SSD storage arrays (depending on your plan). This high IOPS throughput means you can crunch a 500MB log file in seconds. On a budget host using a single SATA drive shared among 50 users? That same process takes 20 minutes of server chugging.
Step 5: Securing the Interface
By default, AWStats puts a CGI script in your cgi-bin. If you don't protect it, your competitors can see your traffic data. That is unacceptable intelligence leakage.
Configure Apache to require a password for the statistics directory:
<Directory "/usr/share/awstats/wwwroot">
Options None
AllowOverride None
Order allow,deny
Allow from all
AuthType Basic
AuthName "Server Statistics - Authorized Personnel Only"
AuthUserFile /etc/awstats/htpasswd.users
Require valid-user
</Directory>
Then generate the password file:
htpasswd -c /etc/awstats/htpasswd.users admin_user
Why This Matters for Your Infrastructure
The difference between a junior admin and a senior architect is visibility. The junior admin wonders why the server is slow. The senior architect looks at AWStats, sees a spike in traffic from a specific subnet in China hitting xmlrpc.php, and blocks it at the firewall level.
| Feature | Google Analytics (JS) | AWStats (Server Logs) |
|---|---|---|
| Data Source | Client-side JavaScript | Server-side Raw Logs |
| Tracks Bots? | No | Yes (Crucial for load analysis) |
| Tracks 404/500 Errors? | No | Yes (Crucial for debugging) |
| Bandwidth Usage | No | Yes (Essential for capacity planning) |
Running log analysis requires a hosting environment that respects resource boundaries. At CoolVDS, we don't oversell our CPU cores. When you run a parse job, you get the cycles you paid for, ensuring your Norway-based users experience zero latency spikes.
Don't let your logs sit in the dark. Illuminate them.
Ready to upgrade from shared hosting? Deploy a high-performance KVM VPS with CoolVDS today and get full root access to analyze every byte.