The Truth is in the /var/log
Let’s be honest for a second. That shiny JavaScript analytics dashboard the marketing team loves? It’s lying to you. Between users running NoScript, mobile browsers with limited JS support, and the sheer volume of bot traffic hitting your NIC, client-side tracking gives you maybe 70% of the picture. As a sysadmin, 70% accuracy isn't data; it's a guess.
I recently inherited a "high-traffic" e-commerce setup running on a budget shared host. The client claimed their site was slow, yet their analytics showed a drop in visitors. One look at tail -f /var/log/httpd/access_log revealed the truth: a massive scraper bot was hammering the catalogue pages, bypassing the JS tracker entirely but consuming 90% of the Apache workers. The server was melting, but the dashboard said it was a slow day.
This is why we need AWStats. It parses the raw server logs. It doesn't care if the client has JavaScript enabled. It captures every GET, POST, and HEAD request. Here is how to set it up properly on a CentOS 5 environment, and why your underlying disk I/O matters more than you think.
Prerequisites and Installation
We are assuming you are running a clean CentOS 5.5 or Ubuntu 10.04 LTS instance. You need root access because we are touching global config files. If you are still on shared hosting, stop reading and upgrade to a VPS. You cannot optimize what you cannot configure.
yum install awstats
If it's not in your repo, grab the RPM from the EPEL repository. Once installed, we need to configure the model file. Copy the template:
cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.yourdomain.com.conf
Configuration: The Vital Flags
Open your new config file. Don't just leave the defaults; that's how you get useless data. You need to verify the LogFile path matches your Apache configuration.
LogFile="/var/log/httpd/access_log"
LogType=W
LogFormat=1
DNSLookup=1
Pro Tip: Set DNSLookup=1 carefully. It resolves IP addresses to hostnames (e.g., seeing .telenor.net instead of raw IPs). However, this adds significant latency to the update process. On a slow network, this will hang the parser. This is where hosting in Norway matters. CoolVDS instances peering directly at NIX (Norwegian Internet Exchange) resolve local hostnames significantly faster than servers routed through Frankfurt.
The I/O Bottleneck
Here is the part most tutorials skip. AWStats is a Perl script that reads massive text files. It is an I/O (Input/Output) heavy operation. If you have a 2GB log file, the script has to read every single byte to generate your report.
On a standard shared host or a budget VPS using OpenVZ with oversold resources, your disk I/O is capped. When you run the update command:
perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=yourdomain.com -update
...you might see your iowait spike in `top`. If your host is using slow 7.2k SATA drives in a crowded RAID array, this parse could take 20 minutes. During that time, your web server performance degrades because the disk heads are thrashing.
This is why we architect CoolVDS differently. We use enterprise-grade SAS 15k RPM drives in RAID-10. We don't oversell the I/O. When you run a log analysis on our Xen-based nodes, the dedicated disk throughput means that 2GB log file gets chewed through in seconds, not minutes. You get the data without the downtime.
Automation and Security
Don't run this manually. Add it to your crontab to update every hour:
0 * * * * perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=yourdomain.com -update > /dev/null
A Note on Norwegian Compliance
If you are hosting data for Norwegian users, remember the Personal Data Act (Personopplysningsloven). IP addresses can be considered personal data. Datatilsynet is strict about where this data lives.
By hosting on a CoolVDS server located physically in Oslo, you simplify your compliance stance under the EU Directive 95/46/EC. You know exactly where the physical hard drives are spinning. Try getting that guarantee from a US-based cloud giant.
Conclusion
Logs are the only source of truth in system administration. They tell you when you are under attack, which images are bandwidth hogs, and exactly where your latency originates. But analyzing them requires hardware that can keep up with the read operations.
Stop settling for "estimated" traffic data. Deploy a high-performance VPS that can handle the heavy lifting.
Need raw power for log parsing? Spin up a CoolVDS Xen instance in Oslo today.