Stop Trusting Javascript: Deep Server Log Analysis with AWStats on Linux
If you are relying solely on Google Analytics to tell you what is happening on your servers, you are flying blind. I've seen it time and time again: a Marketing Manager claims traffic is down, but the Systems Administrator sees the load average on the server spiking to 15. Who is right?
The sysadmin is always right. Why? Because client-side Javascript trackers get blocked. Between NoScript plugins, corporate firewalls, and users disabling Javascript, you are likely missing 10% to 20% of your actual traffic data. Furthermore, Google Analytics won't tell you about the 404 errors bleeding your bandwidth, or the hotlinking from that one forum in Russia that is hammering your image assets.
To see the matrix, you have to look at the raw access logs. Today, we are going old school—but effective. We are setting up AWStats (Advanced Web Statistics) to parse Apache and Nginx logs. This isn't just about counting hits; it's about forensics, bandwidth management, and keeping your data right here in Norway.
The Architecture of Log Analysis
Log analysis is I/O heavy. When you run a parser against a 2GB access log, your disk heads are going to scream. In a shared hosting environment, this process can get your account suspended for "resource abuse." This is why serious admins move to a VPS (Virtual Private Server).
In my recent deployment for a media client in Oslo, we migrated them from shared hosting to a KVM-based instance on CoolVDS. The difference was night and day. On traditional spinning SATA drives (7.2k RPM), generating the monthly report took 45 minutes. On the CoolVDS Enterprise SSD tier, it took 3 minutes. When you are parsing millions of lines of text, random I/O performance is the only metric that matters.
Step 1: Installation (CentOS 6 & Debian 6)
Let's get our hands dirty. We assume you have root access via SSH. If you are still using FTP for everything, stop reading and learn SSH keys first.
For RHEL/CentOS 6 (requires EPEL repository):
# Install EPEL if you haven't already
rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm
# Install AWStats
yum install awstats
For Debian 6 (Squeeze) / Ubuntu 12.04 LTS:
apt-get update
apt-get install awstats
Step 2: Configuring the Web Server Logging
AWStats is a Perl script that reads log files. If your log format is messy, your data will be garbage. Most default Apache configs are decent, but let's ensure we are using the combined format to capture User Agents and Referrers.
Check your httpd.conf or apache2.conf:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog /var/log/httpd/access_log combined
Pro Tip: If you are running Nginx as a reverse proxy in front of Apache (a common setup for high-performance sites), make sure you install mod_rpaf on Apache. Without it, Apache sees all traffic coming from 127.0.0.1, and your AWStats geography report will say all your users live inside your server.
Step 3: Configuring AWStats
Copy the default config file to a new file for your domain.
cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.coolvds-demo.no.conf
vi /etc/awstats/awstats.coolvds-demo.no.conf
You need to change these critical parameters:
# Path to your log file (Ensure permissions allow reading)
LogFile="/var/log/httpd/access_log"
# The type of log. 1 is for Web logs.
LogType=W
# Log format. 1 is Apache Combined (NCSA Combined)
LogFormat=1
# Your domain name
SiteDomain="coolvds-demo.no"
# Aliases (www, etc)
HostAliases="www.coolvds-demo.no localhost 127.0.0.1"
Step 4: The Security Question (Datatilsynet)
Here is the part the "Cloud" evangelists forget. When you use Google Analytics, you are sending data about Norwegian users to servers in the US. Under the Personal Data Act (Personopplysningsloven) and the limitations of the Safe Harbor agreement, this can be a legal grey area for sensitive industries like healthcare or finance.
By hosting your own analytics on a server physically located in Oslo or a nearby European datacenter—like the infrastructure CoolVDS provides—you retain full data sovereignty. You aren't leaking IP addresses to third parties. For local compliance, this "old school" method is actually the most future-proof strategy we have.
Step 5: Automating the Update
Statistics are useless if they aren't up to date. We need to parse the logs regularly. Do not run this every minute; it consumes CPU. Once an hour is standard.
Create a cron job using crontab -e:
# Update stats every hour at the 10th minute
10 * * * * /usr/bin/perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=coolvds-demo.no -update > /dev/null
Analyzing the Output
Once the update runs, access your reports at http://your-server-ip/awstats/awstats.pl?config=coolvds-demo.no. You might need to configure your Apache alias to allow access to the cgi-bin directory, but secure it with an .htpasswd file. You don't want your competitors seeing your traffic data.
What to look for:
- HTTP Status Codes: Look for 404s. These are broken links you need to fix to improve your SEO.
- Bandwidth Robbers: Check the "File type" report. If
.isoor.zipfiles are consuming 80% of your bandwidth, you might need to offload those to a CDN or upgrade your CoolVDS plan to include more transfer. - Bot Traffic: You will be shocked at how many scrapers hit your site. AWStats identifies them. You can then block their IP ranges in
iptablesornginx.conf.
Performance Considerations
Parsing text logs is a single-threaded, disk-intensive operation. On a standard VPS with oversold resources, running an AWStats update on a 5GB log file can cause "IO Wait" (iowait) to spike, making your actual website sluggish for visitors. This is known as the "Noisy Neighbor" effect in virtualization.
This is where infrastructure choice matters. We utilize CoolVDS for our log servers because they use strict KVM virtualization (Kernel-based Virtual Machine) which guarantees resource isolation. Unlike OpenVZ, where a neighbor's log rotation could kill your database performance, KVM ensures your I/O is yours. Combined with their high-performance SSD arrays, log rotation is virtually instantaneous.
Don't let your monitoring tools kill your production environment. Isolate your resources, keep your data in Norway, and trust the logs—not the Javascript.
Ready to take control of your data? Deploy a KVM VPS with pure SSD storage on CoolVDS today and start seeing what's really happening on your network.