The Truth is in the Logs: Why Client-Side Tracking Isn't Enough
I had a client last week claim their new marketing campaign was a bust. "Google Analytics shows flat traffic," they insisted. I logged into their box, ran a quick wc -l on the Apache access logs, and found their traffic had actually tripled. The culprit? Corporate firewalls and users disabling JavaScript. If you rely solely on a snippet of JS to tell you who is visiting your site, you are flying blind.
In 2011, relying on third-party counters is professional negligence. You need to own your data. You need to parse the raw logs. That means AWStats.
But here is the catch: Parsing gigabytes of text files requires serious I/O performance. If you try this on a budget VPS with oversold hard drives, your server load will spike, your MySQL query cache will get evicted, and your site will crawl. This is where the underlying hardware matters.
The Architecture of "Truth"
AWStats (Advanced Web Statistics) works by analyzing the raw server log files. Unlike client-side scripts, it sees everything: bots, image hotlinking, bandwidth thieves, and users with NoScript enabled.
To set this up correctly on a CoolVDS instance running CentOS 5.5, we need to bypass the default package manager which often lags behind, and grab AWStats 7.0 directly to ensure we have the latest browser user-agent definitions.
Step 1: The Pre-Flight Check
First, ensure your Apache httpd.conf is actually logging what we need. The default "Common" log format is useless for detailed analysis. We need "Combined".
CustomLog logs/access_log combined
If you are running a high-traffic site, disk I/O becomes your bottleneck. Writing to logs for every single image request is heavy. On CoolVDS, we utilize Enterprise RAID-10 SAS arrays (and are rolling out early SSD tiers for select clients) to ensure that log writes don't block your application's database reads. If you are on a host with a single SATA drive, consider disabling logging for images to save your disk heads.
configuring awstats.conf for Performance
The default configuration file is a mile long. Here are the parameters that actually matter for a production environment in Norway.
1. Handling DNS Lookups
This is the number one cause of slow stats generation. AWStats tries to resolve every IP address to a hostname.
DNSLookup=0
Unless you have a local caching DNS server with low latency, turn this off. It can turn a 5-minute cron job into a 4-hour ordeal. If you absolutely need to know if a visitor is from Telenor or NextGenTel, use the GeoIP plugin instead of DNS lookups.
2. The LogFormat
Ensure this matches your Apache config exactly. A mismatch here means zero data.
LogFormat=1
The "War Story": The Cron Job That Killed MySQL
I once saw a sysadmin schedule the AWStats update script to run every hour at the top of the hour: 0 * * * *. The problem? Every other cron job on the server (log rotation, backups) was also running at the top of the hour. The load average skyrocketed to 50, and the database locked up.
Pro Tip: offset your heavy tasks. Run your stats update at 17 minutes past the hour. Give your CPU room to breathe.
Data Sovereignty and The Norwegian Context
There is a growing concern regarding where data lives. While the US Safe Harbor framework exists, many Norwegian entities—especially those answering to Datatilsynet—prefer keeping traffic logs within national borders.
When you use Google Analytics, that data sits on servers in California or Dublin. When you use AWStats on a Norwegian VPS, that data sits on a disk in Oslo. You maintain full ownership. For clients in the healthcare or public sector, this distinction is not just technical; it's legal.
Automating the Analysis
Let's set up the automation. We don't want to run this manually.
# crontab -e
17 * * * * /usr/bin/perl /usr/local/awstats/wwwroot/cgi-bin/awstats.pl -config=mysite.com -update >/dev/null
Notice the nice levels? You might want to prepend nice -n 19 if your server is already struggling. However, on a CoolVDS slice, our Xen hypervisor guarantees your CPU cycles, so "CPU stealing" from noisy neighbors is rarely an issue compared to budget OpenVZ providers.
Why Hardware Matters for Log Parsing
Log analysis is a sequential read operation. It is brutal on standard hard drives. If you are parsing a 2GB log file, the disk head has to read every byte.
| Storage Type | Time to Parse 1GB Log | Impact on Web Server |
|---|---|---|
| Standard SATA (7.2k) | ~4-5 Minutes | High (Site may lag) |
| CoolVDS SAS 15k / SSD | ~45 Seconds | Negligible |
Time is money. If you need real-time stats to adjust a running ad campaign, you can't wait for a slow parser.
Stop relying on third-party scripts that get blocked by plugins. Get raw, unfiltered access to your traffic data. Check your current iowait levels—if they are creeping up during log rotation, it's time to migrate.
Need a server that can chew through logs without choking? Deploy a high-performance instance on CoolVDS today.