Console Login

Stop Trusting Javascript: The Hard Truth About Server-Side Log Analysis with AWStats

Stop Trusting Javascript: The Hard Truth About Server-Side Log Analysis with AWStats

If you are relying solely on client-side tracking pixels to measure your server's traffic, you are operating in the dark. I recently audited a high-traffic e-commerce portal based in Oslo where the marketing department claimed they had 50,000 unique visitors a day. The sysadmin logs told a different story: 72,000 uniques. Where did the discrepancy come from? Mobile browsers with limited Javascript support, privacy-conscious users running NoScript, and corporate firewalls stripping tracking beacons.

In the world of Server Administration, the access log is the only source of truth. But raw logs are heavy. Parsing gigabytes of text requires serious I/O throughput and a robust analyzer. That is where AWStats (Advanced Web Statistics) enters the picture. It is old school, written in Perl, and absolutely ruthless in its accuracy.

The Privacy Imperative: Why Norway Needs Local Logs

Here in Norway, the Datatilsynet (Data Protection Authority) is becoming increasingly strict about how we handle user data. Under the Personal Data Act (Personopplysningsloven), relying on third-party US-based analytics services creates a gray area regarding data ownership. When you process logs locally on your own VPS, you retain full sovereignty over that data. You aren't shipping IP addresses across the Atlantic; you are parsing them right here on the server.

Prerequisites and The I/O Bottleneck

Before we touch the terminal, a warning: Log analysis is I/O intensive. If you are running a standard VPS on a spinning HDD with noisy neighbors, running a parser over a 5GB access_log causes the load average to spike, creating "iowait" that slows down your database. This is why we architect CoolVDS instances with pure SSD storage and KVM virtualization. We isolate resources so your background tasks don't kill your frontend performance.

For this guide, I assume you are running CentOS 6.2 or Debian 6 (Squeeze) with Apache 2.2.

Step 1: Installation

On CentOS/RHEL, AWStats is available in the EPEL repository. Don't compile from source unless you enjoy dependency hell.

# Install EPEL repository if not present rpm -Uvh http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-5.noarch.rpm # Install AWStats yum install awstats

On Debian/Ubuntu 12.04 LTS:

apt-get update apt-get install awstats

Step 2: Configuring Apache for Accuracy

Default Apache logging is often insufficient. We need the "Combined" format to see User Agents and Referrers. Check your httpd.conf or apache2.conf:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined CustomLog /var/log/httpd/access_log combined
Pro Tip: If you are behind a reverse proxy (like Varnish or Nginx), the %h variable will show 127.0.0.1. You must install mod_rpaf or change the log format to use %{X-Forwarded-For}i to capture the real client IP. Without this, your geo-location data will be useless.

Step 3: configuring awstats.conf

Navigate to /etc/awstats/. You should create a specific configuration file for your domain. Copy the model file:

cp awstats.model.conf awstats.yourdomain.com.conf vi awstats.yourdomain.com.conf

Here are the critical directives you must change. Do not leave these default:

# Path to your log file. Ensure the user 'apache' or 'www-data' can read it. LogFile="/var/log/httpd/access_log" # '1' for Apache combined logs LogType=W LogFormat=1 # Your domain SiteDomain="yourdomain.com" HostAliases="www.yourdomain.com localhost 127.0.0.1" # DNS Lookup. WARNING: Setting this to 1 kills performance on slow disks. # On CoolVDS SSD instances, you can leave this on, but cache it. DNSLookup=1 # Location of the database files DirData="/var/lib/awstats"

Step 4: The Initial Parse

Now, we generate the statistics database. Run this from the command line first to verify permissions.

/usr/bin/perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -update -config=yourdomain.com

If you see output like Lines in file: 254000... Found 0 dropped..., you are golden. If you see huge numbers of dropped lines, your LogFormat does not match the AWStats configuration.

Step 5: Automation via Cron

Statistics are useless if they aren't fresh. We need to update the database regularly. Edit your crontab:

vi /etc/cron.d/awstats

Add the following line to update every hour:

0 * * * * root /usr/bin/perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -update -config=yourdomain.com > /dev/null

Analyzing the Data: What to Look For

Once you access the web interface (usually at http://yourdomain.com/awstats/awstats.pl?config=yourdomain.com), look immediately at the "Robots/Spiders visitors" section. This is often where your bandwidth bill goes.

In a recent deployment for a Norwegian media site, we found that a specific Chinese scraper was consuming 40% of their bandwidth. Because this traffic didn't execute Javascript, Google Analytics never saw it. AWStats flagged the IP range immediately. We blocked the range in iptables and dropped server load by half.

Performance: The CoolVDS Advantage

Parsing logs is a read-heavy operation. Writing the database is write-heavy. On a traditional VPS with shared spinning disks (SATA), running an update on a log file larger than 1GB can cause the server to hang, leading to 503 errors for your actual users.

This is a fundamental physics problem with mechanical hard drives. The seek times cannot keep up with thousands of random reads.

Feature Generic VPS (SATA) CoolVDS (SSD)
Random IOPS ~75-150 ~50,000+
Log Parse Time (1GB) 4-5 minutes < 45 seconds
Impact on Web Server High (IO Wait) Negligible

At CoolVDS, we utilize enterprise-grade SSD storage arrays. This means you can run aggressive log analysis every 10 minutes without your users noticing a millisecond of latency. Accuracy shouldn't come at the cost of uptime.

Securing the Interface

Finally, do not leave your AWStats interface open to the public. It exposes your directory structure and traffic patterns. Secure it with an .htaccess file:

AuthName "Admin Access Only" AuthType Basic AuthUserFile /etc/awstats/htpasswd.users require valid-user

Data sovereignty and server performance go hand in hand. By processing your logs on a robust, local platform, you satisfy the requirements of the Norwegian Data Protection Authority and gain granular insight into your infrastructure.

Don't let slow I/O blind you to what is actually happening on your server. Spin up a high-performance SSD instance on CoolVDS today and start seeing the whole picture.