The Truth About Your Traffic: Implementing AWStats on CentOS 6
Stop trusting JavaScript. That might sound paranoid, but if you are solely relying on client-side trackers like Google Analytics, you are effectively flying blind. You see the marketing data, sure. But you don't see the hotlinkers draining your bandwidth, the aggressive crawlers from China slamming your backend, or the 404 errors bleeding your SEO value.
I've seen production servers in Oslo grind to a halt not because of legitimate user load, but because a misconfigured bot decided to index a calendar script with infinite parameters. JavaScript tags didn't catch that. The raw access logs did.
To survive as a sysadmin, you need server-side analytics. AWStats remains the undisputed king of log parsing in 2012. It’s ugly, it’s written in Perl, and it works.
Why AWStats Still Matters in 2012
Client-side tracking is easily blocked by privacy plugins (NoScript is gaining traction) and corporate firewalls. Server logs are immutable. They record every single handshake. However, raw logs are unreadable streams of text. AWStats parses these into visual data regarding:
- Bandwidth Usage: Critical if you are on a metered connection.
- HTTP Status Codes: Spotting 500 errors before your boss does.
- Robots/Spiders: Distinguish between Googlebot and a DDoS script.
Pro Tip: In Norway, the Datatilsynet (Data Protection Authority) has strict views on IP storage. When configuring AWStats, consider if you need full IP logging or if masking the last octet is sufficient for your internal compliance policies. Server logs are personal data.
Step 1: Installation on CentOS 6
First, ensure you have the EPEL repository enabled. AWStats isn't in the base CentOS repo.
rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-5.noarch.rpm
yum install awstats
If you are on Debian Squeeze, it's a simple apt-get install awstats.
Step 2: Configuration Surgery
The default configuration is a template. We need to create a specific config file for your domain. Copy the model file:
cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.yourdomain.com.conf
vi /etc/awstats/awstats.yourdomain.com.conf
You need to change these key directives. Don't gloss over the LogFile path; this is where most implementations fail.
# Path to your web server log file
LogFile="/var/log/httpd/access_log"
# The domain name of the site
SiteDomain="yourdomain.com"
# DNS Lookup is slow. Turn it off to save CPU cycles during updates.
DNSLookup=0
# Directory where to store the database files
DirData="/var/lib/awstats"
Step 3: The Apache Integration
AWStats needs to be accessible via the web. We need to configure Apache to serve the Perl scripts. Open your httpd.conf or create a new awstats.conf in /etc/httpd/conf.d/.
Alias /awstatsclasses "/usr/share/awstats/wwwroot/classes/"
Alias /awstatscss "/usr/share/awstats/wwwroot/css/"
Alias /awstatsicons "/usr/share/awstats/wwwroot/icon/"
ScriptAlias /awstats/ "/usr/share/awstats/wwwroot/cgi-bin/"
Options None
AllowOverride None
Order allow,deny
Allow from all
# Secure this directory!
AuthType Basic
AuthName "AWStats Access"
AuthUserFile /etc/awstats/htpasswd.users
Require valid-user
Never leave your statistics public. Competitors can use them to estimate your revenue. Generate the password file:
htpasswd -c /etc/awstats/htpasswd.users admin
Step 4: Nginx Specifics (The Rising Star)
If you have migrated to Nginx (which you should for static content), AWStats cannot run directly as a CGI script because Nginx doesn't spawn processes like Apache. You have two options: use fcgiwrap or run the static report generation.
I prefer the static generation method for Nginx. It's lighter on resources. You run the update script via CLI, build a static HTML page, and let Nginx serve that HTML. No CGI overhead.
/usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=yourdomain.com -output -staticlinks > /var/www/html/awstats.html
The Hidden Cost: Disk I/O
Here is the part nobody tells you. Parsing a 5GB log file requires massive read operations. If you are running this on a cheap VPS with shared mechanical hard drives, your iowait will skyrocket during the update process. This locks up your database and makes your site sluggish for users.
I recently audited a Magento setup where the nightly AWStats cron job was causing a 15-minute downtime every night at 04:00. The disk simply couldn't read the logs and serve MySQL queries simultaneously.
The CoolVDS Advantage
This is where infrastructure choice dictates performance. At CoolVDS, we don't oversell our storage subsystems. We utilize RAID-10 SAS 15k RPM arrays and emerging Enterprise SSD caching layers. This storage architecture provides the low latency and high throughput required to crunch gigabytes of log data in seconds, not minutes.
If you are serious about data, you need a disk subsystem that doesn't choke on a simple grep.
Automation
Finally, set up a cron job to update your stats every hour. Don't do it more often; it's a waste of resources.
# /etc/cron.hourly/awstats
#!/bin/bash
/usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=yourdomain.com -update > /dev/null
Make sure the script is executable: chmod +x /etc/cron.hourly/awstats.
Don't let your server remain a black box. Install AWStats today. If you find your current host struggling to parse the logs, it might be time to test the raw I/O power of a CoolVDS instance. We have racks in Oslo ready to deploy.