Stop Guessing: The Truth is in Your access_log
Let’s be honest. Relying solely on Google Analytics provides a skewed view of reality. Between users disabling Javascript, the rising popularity of NoScript extensions, and the sheer volume of bot traffic hitting your servers, client-side tracking is missing up to 20% of the picture. If you are serious about capacity planning, you need to look at the metal.
For systems administrators managing high-traffic portals in Norway, the server log is the only source of truth. But raw logs are unreadable. That is where AWStats (Advanced Web Statistics) comes in. Unlike Webalizer, which looks like it belongs in the 90s, AWStats offers decent visualization of bandwidth usage, bot crawls, and HTTP error codes.
However, parsing a 2GB log file with a Perl script is not a lightweight task. In this guide, we will configure AWStats for Apache and explain why your hosting architecture—specifically disk I/O—determines whether this analysis gives you insights or crashes your server.
The Setup: AWStats on Linux
Whether you are running CentOS 5.5 or Ubuntu 10.04 LTS, the installation is straightforward, but the configuration is where most admins fail.
1. Installation
On RHEL/CentOS systems (assuming you have the EPEL repository enabled):
yum install awstats
On Debian/Ubuntu systems:
apt-get install awstats
2. The Crucial Configuration
The magic happens in /etc/awstats/awstats.conf. The most common mistake is a mismatch between the Apache `LogFormat` and the AWStats `LogFormat`.
Ensure your Apache configuration (usually in httpd.conf) uses the combined format:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
Then, match it in your AWStats config:
LogFormat=1
Pro Tip: If you are serving a Norwegian audience, ensure DNSLookup=1 is enabled. This resolves IP addresses to hostnames, allowing you to see how much traffic is originating specifically from norwegian ISPs like Telenor or NextGenTel versus foreign bots. Note that this significantly slows down processing, which is why a high-performance VPS is non-negotiable.
The Hidden Cost: CPU and I/O Overhead
Here is the war story. I recently audited a client's setup hosting a busy e-commerce site. They were on a standard shared hosting plan. Every night at 03:00, their site would become unresponsive for 20 minutes. Why?
The cron job running awstats_updateall.pl was parsing the day's logs. AWStats is written in Perl. It is effective, but it is CPU hungry and disk intensive. On a shared host, where you are fighting for I/O operations per second (IOPS) with 500 other neighbors, reading massive text files kills performance.
The Hardware Solution
Software optimization can only go so far. If you are parsing logs for a site with 100,000+ hits a day, the bottleneck is the hard drive.
This is where CoolVDS differs from the budget providers. We utilize enterprise-grade RAID arrays with high rotational speeds (15k RPM SAS) and are currently rolling out SSD storage options. When you run a heavy `grep` or an AWStats update on a CoolVDS instance, you aren't waiting on a spinning platter shared by hundreds of users. You get dedicated throughput.
| Feature | Shared Hosting | CoolVDS (VPS) |
|---|---|---|
| Log Access | Restricted / Delayed | Root Access (Real-time) |
| Script Execution | Timeouts (30s limit) | Unlimited execution time |
| Disk I/O | Choked by neighbors | Dedicated/High Priority |
Compliance and the "Datatilsynet" Factor
Operating in Norway brings legal responsibilities. Under the Personopplysningsloven, you are responsible for the data you collect. IP addresses are considered personal data. When you use external US-based analytics tools, you are exporting that data.
By keeping your logs on a local VPS in Norway, you maintain data sovereignty. You know exactly where the physical server resides. CoolVDS data centers are located locally, ensuring not just compliance, but also ultra-low latency to the NIX (Norwegian Internet Exchange).
Securing the Stats
By default, AWStats might be publicly visible if not configured correctly. Do not let competitors see your traffic sources. Lock it down in your Apache configuration:
<Directory /usr/share/awstats/wwwroot>
Order deny,allow
Deny from all
Allow from 127.0.0.1
Allow from YOUR_OFFICE_IP
AuthType Basic
AuthName "Restricted Access"
AuthUserFile /etc/awstats/htpasswd
Require valid-user
</Directory>
Final Thoughts
Data is your most valuable asset, but only if you can process it efficiently. Don't let a heavy Perl script take down your web server. Move to an environment that respects the need for raw compute power.
If you are tired of "Resource Limit Exceeded" errors when analyzing your own data, it's time to upgrade. Deploy a high-performance instance on CoolVDS today and see what your logs have been trying to tell you.