Stop Trusting Javascript: The Hard Truth About Server-Side Log Analysis with AWStats
If you are relying solely on client-side tracking pixels to measure your server's traffic, you are operating in the dark. I recently audited a high-traffic e-commerce portal based in Oslo where the marketing department claimed they had 50,000 unique visitors a day. The sysadmin logs told a different story: 72,000 uniques. Where did the discrepancy come from? Mobile browsers with limited Javascript support, privacy-conscious users running NoScript, and corporate firewalls stripping tracking beacons.
In the world of Server Administration, the access log is the only source of truth. But raw logs are heavy. Parsing gigabytes of text requires serious I/O throughput and a robust analyzer. That is where AWStats (Advanced Web Statistics) enters the picture. It is old school, written in Perl, and absolutely ruthless in its accuracy.
The Privacy Imperative: Why Norway Needs Local Logs
Here in Norway, the Datatilsynet (Data Protection Authority) is becoming increasingly strict about how we handle user data. Under the Personal Data Act (Personopplysningsloven), relying on third-party US-based analytics services creates a gray area regarding data ownership. When you process logs locally on your own VPS, you retain full sovereignty over that data. You aren't shipping IP addresses across the Atlantic; you are parsing them right here on the server.
Prerequisites and The I/O Bottleneck
Before we touch the terminal, a warning: Log analysis is I/O intensive. If you are running a standard VPS on a spinning HDD with noisy neighbors, running a parser over a 5GB access_log causes the load average to spike, creating "iowait" that slows down your database. This is why we architect CoolVDS instances with pure SSD storage and KVM virtualization. We isolate resources so your background tasks don't kill your frontend performance.
For this guide, I assume you are running CentOS 6.2 or Debian 6 (Squeeze) with Apache 2.2.
Step 1: Installation
On CentOS/RHEL, AWStats is available in the EPEL repository. Don't compile from source unless you enjoy dependency hell.
# Install EPEL repository if not present
rpm -Uvh http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-5.noarch.rpm
# Install AWStats
yum install awstats
On Debian/Ubuntu 12.04 LTS:
apt-get update
apt-get install awstats
Step 2: Configuring Apache for Accuracy
Default Apache logging is often insufficient. We need the "Combined" format to see User Agents and Referrers. Check your httpd.conf or apache2.conf:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog /var/log/httpd/access_log combined
Pro Tip: If you are behind a reverse proxy (like Varnish or Nginx), the%hvariable will show 127.0.0.1. You must installmod_rpafor change the log format to use%{X-Forwarded-For}ito capture the real client IP. Without this, your geo-location data will be useless.
Step 3: configuring awstats.conf
Navigate to /etc/awstats/. You should create a specific configuration file for your domain. Copy the model file:
cp awstats.model.conf awstats.yourdomain.com.conf
vi awstats.yourdomain.com.conf
Here are the critical directives you must change. Do not leave these default:
# Path to your log file. Ensure the user 'apache' or 'www-data' can read it.
LogFile="/var/log/httpd/access_log"
# '1' for Apache combined logs
LogType=W
LogFormat=1
# Your domain
SiteDomain="yourdomain.com"
HostAliases="www.yourdomain.com localhost 127.0.0.1"
# DNS Lookup. WARNING: Setting this to 1 kills performance on slow disks.
# On CoolVDS SSD instances, you can leave this on, but cache it.
DNSLookup=1
# Location of the database files
DirData="/var/lib/awstats"
Step 4: The Initial Parse
Now, we generate the statistics database. Run this from the command line first to verify permissions.
/usr/bin/perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -update -config=yourdomain.com
If you see output like Lines in file: 254000... Found 0 dropped..., you are golden. If you see huge numbers of dropped lines, your LogFormat does not match the AWStats configuration.
Step 5: Automation via Cron
Statistics are useless if they aren't fresh. We need to update the database regularly. Edit your crontab:
vi /etc/cron.d/awstats
Add the following line to update every hour:
0 * * * * root /usr/bin/perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -update -config=yourdomain.com > /dev/null
Analyzing the Data: What to Look For
Once you access the web interface (usually at http://yourdomain.com/awstats/awstats.pl?config=yourdomain.com), look immediately at the "Robots/Spiders visitors" section. This is often where your bandwidth bill goes.
In a recent deployment for a Norwegian media site, we found that a specific Chinese scraper was consuming 40% of their bandwidth. Because this traffic didn't execute Javascript, Google Analytics never saw it. AWStats flagged the IP range immediately. We blocked the range in iptables and dropped server load by half.
Performance: The CoolVDS Advantage
Parsing logs is a read-heavy operation. Writing the database is write-heavy. On a traditional VPS with shared spinning disks (SATA), running an update on a log file larger than 1GB can cause the server to hang, leading to 503 errors for your actual users.
This is a fundamental physics problem with mechanical hard drives. The seek times cannot keep up with thousands of random reads.
| Feature | Generic VPS (SATA) | CoolVDS (SSD) |
|---|---|---|
| Random IOPS | ~75-150 | ~50,000+ |
| Log Parse Time (1GB) | 4-5 minutes | < 45 seconds |
| Impact on Web Server | High (IO Wait) | Negligible |
At CoolVDS, we utilize enterprise-grade SSD storage arrays. This means you can run aggressive log analysis every 10 minutes without your users noticing a millisecond of latency. Accuracy shouldn't come at the cost of uptime.
Securing the Interface
Finally, do not leave your AWStats interface open to the public. It exposes your directory structure and traffic patterns. Secure it with an .htaccess file:
AuthName "Admin Access Only"
AuthType Basic
AuthUserFile /etc/awstats/htpasswd.users
require valid-user
Data sovereignty and server performance go hand in hand. By processing your logs on a robust, local platform, you satisfy the requirements of the Norwegian Data Protection Authority and gain granular insight into your infrastructure.
Don't let slow I/O blind you to what is actually happening on your server. Spin up a high-performance SSD instance on CoolVDS today and start seeing the whole picture.