Stop Guessing: Advanced Log Analysis with AWStats on CentOS 6
If you are relying solely on JavaScript-based analytics to monitor your infrastructure, you are flying blind. I said it. JavaScript tags fail when users disable scripts, when mobile browsers time out, or when the request comes from a bot that doesn't execute client-side code.
I learned this the hard way last month. A high-traffic e-commerce client in Oslo was experiencing random load spikes every day at 03:00 CET. Their analytics dashboard showed zero activity. The server load, however, was pegging the CPU at 100%. We were looking at a ghost.
The truth was buried in /var/log/httpd/access_log. A scraper was hitting their search controller 50 times per second. We only found it by parsing the raw server logs. This is why you need AWStats. It parses the actual server logs, telling you exactly what hit your network interface, not just what a browser decided to report.
Why AWStats Still Rules in 2012
There are newer tools popping up, and yes, some people are trying to pipe everything into complex Java stacks, but for a lean Linux administrator, AWStats remains the gold standard for detailed server-side metrics. It is written in Perl, it is stable, and it understands the Combined Log Format natively.
However, log analysis is I/O heavy. Parsing a 4GB log file requires massive read throughput. This is where your infrastructure choice matters. Most budget VPS providers in Europe are still spinning 7.2k RPM SATA drives. If you try to parse a week's worth of traffic on those, your iowait will skyrocket, and your web server will choke.
Pro Tip: Always run log analysis on a separate partition or a dedicated IO thread if possible. If you are hosted on CoolVDS, our SSD-backed storage arrays (rare in this market) handle high IOPS significantly better than standard SATA RAID configurations, meaning your nightly stats generation won't kill your site's performance.
Step 1: Installation on CentOS 6
We assume you are running a standard LAMP stack (Linux, Apache, MySQL, PHP) on CentOS 6.3 or similar. First, we need to enable the EPEL repository, as AWStats isn't in the core repos.
rpm -Uvh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
yum install awstats
This installs the Perl scripts and the necessary libraries. If you are on Debian 6 (Squeeze), a simple apt-get install awstats will suffice.
Step 2: Configuring Apache
Before AWStats can read anything, we must ensure Apache is logging the right data. Open your httpd.conf (usually in /etc/httpd/conf/) and verify you are using the 'combined' format.
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog logs/access_log combined
If you don't log the User-Agent, you won't be able to distinguish between a legitimate visitor from Drammen and a spambot from a hijacked server network.
Step 3: configuring AWStats
Copy the model config file to a new file named after your domain.
cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.yourdomain.com.conf
vi /etc/awstats/awstats.yourdomain.com.conf
You need to change these key parameters:
LogFile="/var/log/httpd/access_log"(Point this to your actual log location)SiteDomain="yourdomain.com"HostAliases="www.yourdomain.com localhost 127.0.0.1"AllowToUpdateStatsFromBrowser=0(Security first: only update via CLI/Cron)
Step 4: The Privacy Issue (Datatilsynet)
Operating in Norway means respecting privacy. The Personal Data Act is strict. Storing full IP addresses of users for analytics can be a grey area. It is good practice to mask the IP addresses in your stats if you don't strictly need them for forensics.
In your config file, you can use the Not Page List to exclude tracking on sensitive URLs, or use a plugin like geoipfree to aggregate data by country without storing user-specific patterns unnecessarily. Being compliant now saves headaches later.
Step 5: Automation and Security
Do not let the whole world see your traffic data. Competitors would love to know your conversion rates. Protect the AWStats directory using .htaccess.
# /var/www/awstats/.htaccess
AuthName "Server Stats"
AuthType Basic
AuthUserFile /etc/awstats/.htpasswd
Require valid-user
Finally, set up the cron job to update stats every hour. Edit /etc/cron.d/awstats:
0 * * * * root /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=yourdomain.com -update > /dev/null
Performance: The Hidden Cost of Logs
Here is the reality check. When AWStats parses a log file, it reads line-by-line using Perl. On a high-traffic site generating 2GB of logs daily, this process is CPU intensive.
| Environment | Log Size | Processing Time | Impact |
|---|---|---|---|
| Budget Shared Hosting | 2 GB | 14 mins | Site slows down due to I/O wait |
| Standard HDD VPS | 2 GB | 5 mins | Noticeable lag |
| CoolVDS SSD KVM | 2 GB | 45 seconds | Zero visible impact |
If you are serious about data, you cannot run this on legacy hardware. Latency matters. Not just network latency to the NIX (Norwegian Internet Exchange), but disk latency. CoolVDS utilizes pure SSD storage and KVM virtualization, ensuring that when your cron job kicks in, your database doesn't starve for IOPS.
Final Thoughts
Logs are your source of truth. They tell you when you are being attacked, when your 404 errors are spiking, and where your bandwidth is actually going. Don't leave them rotting in /var/log. Install AWStats, secure it, and host it on a platform that can handle the read-load without sweating.
Need a sandbox to test your configuration? Deploy a CoolVDS SSD instance in Oslo today and stop waiting for your hard drive to catch up.