Console Login

Beyond Google Analytics: Mastering Server Log Analysis with AWStats on Linux

The Truth is in the Access Logs

Most webmasters deploy a Javascript snippet and call it a day. They rely on Google Analytics to tell them who is visiting their site. But as any battle-hardened sysadmin knows, client-side scripts lie. They don't track the bots scraping your content. They don't record the 404 errors bleeding your SEO ranking. And they certainly don't tell you who is hotlinking your images and stealing your bandwidth.

To see what is actually happening on your metal, you need to parse the raw Apache access_log. In 2010, the gold standard for this is still AWStats. It’s Perl-based, it’s comprehensive, and unlike Webalizer, it actually produces reports you can show to a client without being embarrassed.

However, log analysis is heavy on I/O. If you parse a 2GB log file on a cheap, oversold VPS, you will spike the CPU wait time and degrade your web server's performance. This is why architecture matters.

Step 1: The Setup on CentOS 5

We assume you are running a standard LAMP stack on CentOS 5 or RHEL 5. First, enable the EPEL repository if you haven't already, as AWStats isn't in the base repo.

rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-3.noarch.rpm yum install awstats

Once installed, avoid the default configuration wizard. It rarely gets the paths right for virtual hosts. Instead, copy the model config:

cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.yourdomain.com.conf

Step 2: Configuration & The LogFormat Trap

Open your new config file. The most critical setting is LogFormat. If this doesn't match your Apache configuration exactly, AWStats will parse nothing and return zero results.

In your httpd.conf, ensure you are using the 'combined' format:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

Then, in your AWStats config:

LogFormat=1 LogFile="/var/log/httpd/domains/yourdomain.com-access_log" SiteDomain="yourdomain.com"
Pro Tip: Set DNSLookup=0. Doing reverse DNS lookups on every IP address during log processing will kill your processing time and likely get your IP rate-limited by upstream DNS providers. Keep it off unless you absolutely need hostname resolution.

The Hidden Cost: I/O Wait and Virtualization

Here is where the hardware reality hits. When AWStats runs—usually via a nightly cron job—it reads massive text files. On a shared hosting environment or a budget VPS using OpenVZ with oversold disks, this operation can cause "noisy neighbor" issues. Your disk I/O saturates, and your MySQL database (which also needs disk I/O) locks up.

We designed CoolVDS to solve this specific bottleneck. Unlike budget providers squeezing hundreds of containers onto a single SATA drive, we utilize high-performance 15k RPM SAS RAID-10 arrays with dedicated slices. When you crunch 5GB of logs on a CoolVDS instance, you aren't fighting 50 other users for disk headers. The operation finishes in seconds, not minutes.

Securing the Data (Norwegian Context)

Operating in Norway requires strict adherence to the Personopplysningsloven (Personal Data Act). IP addresses are considered personally identifiable information (PII) by Datatilsynet. If you are generating public stats pages, you are exposing user data.

Always restrict access to the AWStats directory. Do not leave it open to the world. Use an .htaccess file:

AuthType Basic AuthName "Restricted Access" AuthUserFile /usr/local/apache/passwd/passwords Require valid-user

Furthermore, consider using the AllowAccessFromWebToAuthenticatedUsersOnly=1 directive inside AWStats to ensure reports cannot be scraped.

Automation

Don't run updates manually. Add this to your crontab to update every 6 hours:

0 */6 * * * /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=yourdomain.com -update > /dev/null

Why Accuracy Costs Power

Running log analysis locally gives you data that Google Analytics misses: Bandwidth theft. By checking the "File Type" report, you can see if your .jpg or .avi bandwidth is skyrocketing, indicating someone is hotlinking your media.

But accuracy requires raw power. You need a server that handles the parse load without interrupting the Apache processes serving live visitors. If your current host chokes every time you run a log audit, it's time to upgrade infrastructure.

Stop guessing about your traffic. Get a CoolVDS instance with dedicated RAM and high-speed SAS storage today, and start treating your server logs with the respect they deserve.