Console Login

Stop Trusting JavaScript: Deep Server Log Forensics with AWStats on CentOS 6

Stop Trusting JavaScript: Deep Server Log Forensics with AWStats on CentOS 6

I recently audited a high-traffic e-commerce cluster serving the Nordic market. The marketing director was panicking because Google Analytics showed a 15% drop in conversions, yet the database showed sales were steady. The culprit? A new ad-blocker update and heavy mobile latency dropping JavaScript tags before they could fire.

If you rely solely on client-side tracking, you are flying blind. Your server logs—access.log and error.log—are the only source of truth. They capture the bots, the scrapers, the hot-linkers, and the users with JavaScript disabled.

In this guide, we aren't just installing a stats viewer. We are building a forensic traffic analysis engine using AWStats 7.0 on CentOS 6.3. We will configure it to parse gigabytes of logs without bringing your CPU to its knees, ensuring you stay compliant with the Norwegian Personal Data Act (Personopplysningsloven).

Why AWStats Still Beats the Cloud

In 2012, the trend is moving toward cloud analytics, but keeping data local is critical for latency and legality. When you export user IPs to US-based servers, you navigate the murky waters of the Safe Harbor framework. Hosting in Norway removes that headache.

Feature JavaScript Trackers (GA) Server-Side (AWStats)
Data Accuracy Dependent on client browser execution 100% of server requests
Bot Detection Poor (often ignored) Excellent (User-Agent analysis)
Bandwidth Tracking Impossible Precise byte-level tracking
Data Ownership Third-party owned You own 100% of the data

Step 1: The Environment & Prerequisites

Parsing text files is I/O heavy. If you run this on a legacy VPS with noisy neighbors and spinning hard drives, your iowait will spike, and your web server will stutter. This is where hardware architecture matters.

Pro Tip: On a standard SATA drive, analyzing a 2GB log file can take minutes. On CoolVDS instances, which utilize enterprise-grade SSD arrays and KVM virtualization, we see parse times drop by nearly 60%. Don't analyze logs on the same disk spindle serving your database.

We assume you are running CentOS 6.x with the EPEL repository enabled.

yum install awstats perl-libwww-perl.noarch

Step 2: Configuring Apache for Forensic Logging

By default, Apache's combined log format is decent, but we can do better. We need to ensure we are capturing the true IP address, especially if you are behind a reverse proxy like Varnish or Nginx (which is becoming standard for high-performance setups).

Edit your /etc/httpd/conf/httpd.conf:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

# If you use a load balancer or Varnish, use X-Forwarded-For instead of %h
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" proxy_combined

Step 3: Tuning the AWStats Configuration

Copy the model config to a domain-specific file:

cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.yourdomain.com.conf

Now, open awstats.yourdomain.com.conf. We need to change a few critical parameters to handle high loads and ensure we filter out internal monitoring traffic (like Nagios or Zabbix).

# Point to your actual log file
LogFile="/var/log/httpd/access_log"

# 1 = Apache combined log format
LogFormat=1

# DNS Lookup. TURN THIS OFF for performance. 
# Doing reverse DNS lookups on every IP will destroy your parse speed.
DNSLookup=0

# Filter out your own IP and monitoring tools
SkipHosts="127.0.0.1 192.168.1.50"
SkipUserAgents="uptimebot|nagios|zabbix"

# Storage directory for the compiled stats
DirData="/var/lib/awstats"

Step 4: Secure Execution and Automation

Never run the update script as root if you can avoid it, though on CentOS, permissions often force the hand. A better approach is to set up a cron job that runs during off-peak hours (e.g., 04:00 AM Oslo time).

0 4 * * * /usr/bin/perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=yourdomain.com -update > /dev/null

To view the stats, you must secure the directory. Do not leave your stats open to the public; it leaks your traffic data and backend architecture.

In /etc/httpd/conf.d/awstats.conf:

<Directory /usr/share/awstats/wwwroot>
    Options None
    AllowOverride None
    Order allow,deny
    Allow from 127.0.0.1
    # Allow from your office static IP
    Allow from 84.234.xx.xx 
    AuthType Basic
    AuthName "Restricted Access"
    AuthUserFile /etc/awstats/htpasswd
    Require valid-user
</Directory>

The Storage Bottleneck: Why IOPS Matter

When AWStats parses a 5GB log file, it performs massive sequential reads and random writes to the database files. On shared hosting or budget VPS providers using mechanical SAS drives in RAID 5/6, this operation causes "IO Wait." This means your CPU sits idle, waiting for the disk to deliver data.

While your server is waiting on logs, it isn't serving PHP requests. Your site slows down.

This is why at CoolVDS, we deploy high-performance SSD storage in RAID-10 configurations. The IOPS (Input/Output Operations Per Second) throughput of SSDs allows log parsing to happen almost instantaneously, without impacting the live web server performance.

Data Sovereignty in the Nordics

A final note for my fellow admins in Norway: The Datatilsynet (Data Inspectorate) is becoming increasingly strict about IP address storage. If you log full IP addresses, you are processing PII (Personally Identifiable Information). Configure log rotation to delete raw logs after 30 days, or use AWStats' Plugin="geoip" cautiously to aggregate data without retaining raw IPs indefinitely.

Keeping your data within Norwegian borders—on a provider like CoolVDS with low latency to the NIX (Norwegian Internet Exchange)—is not just about speed; it's about control.

Ready to analyze traffic without the lag? Don't let slow I/O kill your SEO. Deploy a test instance on CoolVDS in 55 seconds and experience the difference high-performance storage makes.