Javascript Trackers Are Lying to You
Let's be honest. Relying solely on Google Analytics or other Javascript-based counters is a rookie mistake. They miss users with NoScript enabled, they can't see the bots scraping your content, and they are completely blind to bandwidth theft from hotlinking. If you are running a serious setup, you need to look at the raw metal. You need to parse the server logs.
In 2009, bandwidth isn't free. If you are hosting a media-heavy site in Norway, you know that upstream providers charge a premium for excessive throughput. I've seen servers grind to a halt not because of legitimate traffic, but because a Chinese botnet decided to spider a calendar script efficiently generating infinite URLs. Client-side scripts never saw it happen. The server load average hit 20.0, and the admin had no idea why.
This is where AWStats (Advanced Web Statistics) comes in. It doesn't guess. It reads the Apache `access_log` line by line. It tells you exactly who is hitting your VPS, what status code they got, and how much data you sent them.
Prerequisites and The "Combined" Log Format
Most default Apache installations on CentOS 5 or Debian Lenny are lazy. They might log, but they often default to the `Common` log format, which lacks the User-Agent and Referer headers. Without these, AWStats is blind to where traffic is coming from.
First, verify your httpd.conf or apache2.conf. You need to ensure you are using the combined format:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog /var/log/httpd/access_log combinedOnce you reload Apache, your logs will start capturing the data necessary for deep analysis. Now, let's get AWStats running.
Installation and Configuration on a VPS
Installing AWStats is straightforward via the EPEL repository or apt, but the configuration requires a surgeon's touch. Do not just rely on the defaults if you have high traffic.
yum install awstats
vi /etc/awstats/awstats.yourdomain.confHere is where the magic happens. You need to point the LogFile directive to your rotation script. If you are using logrotate (which you should be), ensure AWStats processes the logs before they are compressed and archived.
Pro Tip: Processing a 2GB log file is I/O intensive. On shared hosting, this script will time out or get killed by the process watcher. This is why we recommend CoolVDS. Our underlying storage runs on enterprise-grade 15k RPM SAS drives in RAID-10 arrays. The high disk I/O throughput means AWStats can parse millions of lines in seconds without causing IOWAIT spikes that slow down your HTTP service.
The GeoIP Factor
By default, AWStats resolves IPs to hostnames via DNS. Turn this OFF immediately (`DNSLookup=0`). Reverse DNS lookups will kill your performance and introduce massive latency. Instead, use the MaxMind GeoIP plugin. It does local lookups against a binary database to determine if that visitor is from Norway, Germany, or the US.
In your config file, uncomment:
LoadPlugin="geoip GEOIP_STANDARD /usr/share/GeoIP/GeoIP.dat"This keeps your processing strictly local. No network calls per visitor.
Compliance and The Norwegian Context
Hosting in Norway brings specific advantages and responsibilities. Under the Personopplysningsloven (Personal Data Act), you are responsible for the data you collect. Server logs contain IP addresses, which Datatilsynet considers personal data in many contexts.
By running AWStats locally on your own VPS, rather than sending data to a third-party US server, you maintain tighter control over this data. You aren't shipping your user's browsing habits across the Atlantic. Furthermore, hosting physically in Oslo means your latency to the NIX (Norwegian Internet Exchange) is practically zero. When you are SSH'ing in to tail logs in real-time, that responsiveness matters.
Analyzing the Data: What to Look For
Once AWStats is generating reports, look at these three metrics immediately:
- HTTP Status 404: A high number of 404s usually indicates a broken link on an external site or a bot probing for vulnerabilities (like scanning for `phpmyadmin` or `wp-login.php`). Fail2Ban can be configured to read these logs and ban the IPs automatically.
- Bandwidth Grabbers: Sort by "Bandwidth" rather than "Hits". You might find that a single image is being hotlinked on a high-traffic forum, draining your monthly transfer quota. Block these via `.htaccess`.
- Robots/Spiders: AWStats separates humans from robots. If your robot traffic exceeds human traffic significantly, you need to adjust your `robots.txt` or start blocking aggressive User-Agents.
The Infrastructure Reality
Log analysis is heavy lifting. It requires memory and disk speed. Standard shared hosting accounts will often suspend you for running heavy Perl scripts like AWStats on large log files. You need dedicated resources.
This is the fundamental difference with CoolVDS. We provide Xen-based virtualization. This means your RAM is reserved, and your swap is yours. When you run a heavy analysis job, you aren't fighting for CPU cycles with 500 other customers. You get the raw power you need to audit your infrastructure without taking your site offline.
Stop guessing what is happening on your server. Enable the logs, configure the parser, and take control of your traffic.