Stop Guessing: True Traffic Analysis with AWStats
Let’s be honest: those JavaScript-based trackers everyone is pasting into their footers are lying to you. Google Analytics is fine for marketing trends, but if you are a Systems Administrator responsible for capacity planning, it is useless. It misses bots, it misses users with NoScript, it misses hotlinked images, and it tells you absolutely nothing about bandwidth theft.
To know what is actually hitting your network interface, you need to parse the raw access logs. In 2010, the gold standard for this remains AWStats. It’s powerful, it’s Perl-based, and if you aren't careful, it will chew up your CPU cycles faster than a fork bomb.
The Discrepancy: Client-Side vs. Server-Side
I recently audited a media server hosting heavy video assets for a client in Oslo. Their marketing dashboard claimed 2,000 visits a day. The server load was spiking as if they had 20,000. Why? Because search crawlers and direct file leechers don't execute JavaScript.
Here is the reality of the stack:
| Feature | Google Analytics (JS) | AWStats (Server Logs) |
|---|---|---|
| Tracks Bots/Crawlers | No | Yes (Crucial for load balancing) |
| Bandwidth Usage | No | Yes (Byte-level precision) |
| Error Codes (404, 500) | No | Yes |
| Performance Impact | Client-side latency | Server-side CPU/Disk I/O |
Installation and Configuration
Whether you are running CentOS 5.4 or Debian Lenny, getting AWStats running is straightforward, but the configuration separates the amateurs from the pros.
1. Installation
On our RedHat/CentOS based systems:
yum install awstats
On Debian/Ubuntu:
apt-get install awstats
2. The Critical Config
The default configuration is usually located at /etc/awstats/awstats.conf. The most important directive is the LogFile. If you are running a standard Apache 2.2 setup, it looks like this:
LogFile="/var/log/httpd/access_log"
Pro Tip: If you are managing multiple virtual hosts, do not use one giant log file. Split your Apache logs in your httpd.conf using the CustomLog directive. It makes parsing significantly faster.
3. Handling the "Update" Load
AWStats is written in Perl. It parses text files line-by-line. On a high-traffic site, running the update script can cause I/O wait times to skyrocket. This is where your choice of hosting architecture matters.
Many budget VPS providers use OpenVZ containerization. In those environments, "burst RAM" is a myth, and disk I/O is shared aggressively. When you run an AWStats update on a 500MB log file, you might choke the neighbor's database, or worse, they choke you. At CoolVDS, we utilize Xen hypervisors to ensure hardware isolation. When you execute a Perl script, you get the dedicated CPU cycles you paid for, ensuring your log analysis doesn't take down your web server.
Automation via Cron
Don't run updates manually. Set up a cron job to process logs during off-peak hours (usually 03:00 or 04:00 CET for Norwegian audiences).
0 3 * * * /usr/share/awstats/tools/awstats_updateall.pl now > /dev/null
Privacy and The Datatilsynet
Operating in Norway means respecting the Personopplysningsloven (Personal Data Act). IP addresses are considered personal data. If you are storing raw logs, you have a responsibility to secure them.
You should enable the GeoIP plugin in AWStats to resolve countries without needing to store every IP indefinitely for public viewing. Ensure your awstats.conf includes:
LoadPlugin="geoip FREE"
This requires the Geo::IP Perl module and the MaxMind GeoIP database. It allows you to see that 80% of your traffic is coming from Oslo and Bergen without violating user trust.
Infrastructure Matters
Log analysis is I/O intensive. Reading gigabytes of text files requires high throughput storage. While standard SATA drives are common, the industry is shifting toward faster SAS RAID-10 arrays for enterprise workloads. If your disk queue length is constantly high during log rotation, your site speed suffers.
System Architect Note: Never store your logs on the same partition as your root OS if you can avoid it. If logs fill the disk, the OS halts. Mount a separate
/var/logpartition. On CoolVDS instances, we recommend allocating specific block storage for logs to keep your root filesystem clean and responsive.
The Verdict
You cannot optimize what you do not measure. JavaScript trackers give you demographics; AWStats gives you infrastructure reality. Install it, configure the cron job, and ensure your hosting platform has the dedicated I/O throughput to handle the parsing.
Need a sandbox to test your Perl configurations without risking your production node? Spin up a Xen-based instance on CoolVDS today. Experience low latency connectivity within the Nordic region and the stability of true hardware virtualization.