Console Login

Stop Guessing, Start Parsing: High-Performance Log Analysis with AWStats in 2012

Stop Guessing, Start Parsing: High-Performance Log Analysis with AWStats in 2012

If you are relying solely on JavaScript-based analytics to make infrastructure decisions, you are flying blind. Between NoScript extensions, corporate firewalls stripping headers, and the increasing trend of "Do Not Track" in modern browsers like Firefox 14, you are likely missing 10% to 15% of your actual traffic.

The only source of truth is the server log. It captures every request, every 404, every bot, and every brute-force attempt. But raw logs are noise. To turn that noise into intelligence without bringing your server to its knees, you need AWStats—configured correctly.

In this guide, we are going to bypass the basic "yum install" tutorials. I will show you how to set up AWStats for high-traffic environments, handle log rotation without losing data, and why disk I/O is the bottleneck you aren't watching closely enough.

The Architecture of Truth: Why Server-Side?

Unlike Google Analytics, AWStats processes the raw access.log files generated by your web server (Apache or Nginx). This distinction is critical for Norwegian system administrators dealing with the Datatilsynet (Data Inspectorate). When you hand data off to third-party US servers, you enter the murky waters of the US-EU Safe Harbor framework.

By keeping analytics local on your own VPS, you adhere strictly to the Personal Data Act (Personopplysningsloven). You own the data. It stays in Oslo (or wherever your datacenter is), and you control exactly how long IP addresses are retained.

Prerequisites

  • OS: CentOS 6.3 or Debian 6 (Squeeze)
  • Web Server: Apache 2.2+ or Nginx 1.2.x
  • Perl: 5.10+
  • Resources: Minimum 512MB RAM (Perl is memory hungry during parsing)

Step 1: The Installation (The Right Way)

On a standard CentOS 6 box, avoid compiling from source unless you need specific patches. The EPEL repository is stable enough.

# Enable EPEL if you haven't already
rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-7.noarch.rpm

# Install AWStats
yum install awstats

For Debian Squeeze users:

apt-get update
apt-get install awstats

Step 2: Web Server Configuration

AWStats craves the "Combined" log format. If you are running a high-performance Nginx reverse proxy, the default log format might be missing the User-Agent or Referrer, which renders analytics useless.

Nginx Configuration (nginx.conf)

Open /etc/nginx/nginx.conf and ensure your `log_format` looks like this:

http {
    log_format combined_custom '$remote_addr - $remote_user [$time_local] '
                               '"$request" $status $body_bytes_sent '
                               '"$http_referer" "$http_user_agent"';

    access_log /var/log/nginx/access.log combined_custom buffer=32k;
}
Pro Tip: Note the buffer=32k flag. On high-traffic sites, writing to disk for every single request causes massive I/O wait. Buffering log writes reduces disk thrashing, which is essential if you aren't yet on Solid State storage.

Apache Configuration (httpd.conf)

For Apache 2.2, ensure the CustomLog directive is active:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog /var/log/httpd/access_log combined

Step 3: Configuring AWStats for Performance

The default config file is a template. Copy it to a production name:

cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.yoursite.com.conf
vi /etc/awstats/awstats.yoursite.com.conf

Change these critical parameters:

# Path to your log file
LogFile="/var/log/httpd/access_log"

# Standard combined format
LogFormat=1

# DNS Lookup is the enemy of speed. 
# Setting this to 1 forces a reverse DNS lookup for every IP.
# It will make the update process 90% slower. Keep it 0 for production.
DNSLookup=0

# Directory where data files are stored
DirData="/var/lib/awstats"

Step 4: The "War Story" – Handling Log Rotation

I recently audited a Magento installation where analytics data kept disappearing. The culprit? logrotate.

When Linux rotates logs (usually at 4:00 AM via cron), it renames access.log to access.log.1 and creates a fresh empty file. If AWStats runs at 5:00 AM, it scans the new, empty file and misses the traffic from 00:00 to 04:00.

The Fix: Force AWStats to update before rotation happens. Edit /etc/logrotate.d/httpd (or nginx) and add a prerotate script.

/var/log/httpd/*log {
    missingok
    notifempty
    sharedscripts
    delaycompress
    postrotate
        /sbin/service httpd reload > /dev/null 2>/dev/null || true
    endscript
    # ADD THIS BLOCK
    prerotate
        /usr/bin/perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -update -config=yoursite.com
    endscript
}

This ensures every byte of data is parsed before the file is archived.

The I/O Bottleneck: Why Hardware Matters

Here is the reality of parsing logs: it is an I/O punisher. AWStats is a Perl script that reads text files line-by-line. If you have a 2GB log file, your CPU is mostly waiting on the hard drive to deliver data.

On traditional spinning HDD VPS hosting, I have seen the wa (Wait I/O) metric in top spike to 40% during log analysis. This degrades the performance of your MySQL database and web server, causing latency for actual users.

This is where infrastructure choice becomes architectural strategy. We built CoolVDS on high-performance storage arrays specifically to eliminate this "I/O steal." While others oversell standard SATA drives, our VDS instances utilize enterprise-grade SSD caching and RAID-10 SAS configurations that handle random read/write operations significantly faster.

If you are running `grep` or AWStats on a CoolVDS instance, you will notice the parse time drops from minutes to seconds. You don't need to schedule analytics for the middle of the night; you can run them mid-day without impacting your site's Time-to-First-Byte (TTFB).

Securing the Output

AWStats generates static HTML or runs as a CGI script. The CGI method is vulnerable if not secured. Do not leave your stats open to the public.

Add a basic auth protection in your Apache config:

<Directory "/usr/share/awstats/wwwroot">
    AuthType Basic
    AuthName "Restricted Access"
    AuthUserFile /etc/awstats/htpasswd
    Require valid-user
</Directory>

Create the user:

htpasswd -c /etc/awstats/htpasswd admin

Conclusion

Log analysis is not just about counting hits; it's about forensic visibility into your infrastructure. By keeping this process internal, you satisfy European privacy directives and gain granular insight that JavaScript trackers miss.

However, effective analysis requires hardware that can keep up with the read/write demand. Don't let your monitoring tools slow down your production environment. If you are tired of watching `iowait` kill your server performance, it is time to upgrade.

Deploy a CoolVDS High-IO instance today and process your logs at the speed of business.