The Truth is in the Logs: Why Google Analytics Isn't Enough
If you are relying solely on JavaScript-based trackers like Google Analytics to make infrastructure decisions, you are flying blind. I recently audited a high-traffic media site in Oslo where the marketing team claimed they had 50,000 unique visitors a day. The server load averages, however, suggested something entirely different.
We dove into the /var/log/httpd/access_log and found the reality: 85,000 uniques. The discrepancy? Corporate firewalls stripping tracking pixels, users with NoScript extensions, and aggressive bot traffic that JS trackers never see. When you are tuning Apache MaxClients or configuring MySQL buffers, "marketing numbers" are useless. You need raw, server-side truth.
Enter AWStats. It’s not new, it’s not flashy, but it parses the raw I/O that actually hits your disk. Here is how to set it up correctly without killing your server's performance.
The I/O BottleNeck: Why Log Parsing Kills Cheap VPS
Before we touch the config, we need to address the hardware. Parsing a 5GB log file is an I/O punisher. On a typical oversold OpenVZ container, initiating a heavy AWStats update process can spike your I/O wait (%wa in top) to 40%+, causing your actual web application to hang.
This is where architecture matters. At CoolVDS, we utilize Xen virtualization with strict resource isolation. We don't overcommit disk throughput. If you are running high-traffic sites, you should be looking at our Enterprise SSD tier. In 2011, spinning rust (even SAS 15k) struggles to read massive log files while simultaneously writing database transactions. SSDs cut this parse time by an order of magnitude.
Step 1: Configuring Apache for AWStats
Standard Common Log Format (CLF) is okay, but we want more data. Ensure your Apache configuration (usually in /etc/httpd/conf/httpd.conf on CentOS) uses the Combined format to capture User-Agents and Referrers.
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog logs/access_log combined
Pro Tip: If you are behind a reverse proxy or load balancer (like Varnish or Nginx), the %h variable will show 127.0.0.1. You must install mod_rpaf to see the real client IP, otherwise, AWStats will think all your traffic is coming from localhost.
Step 2: The AWStats Configuration
Install AWStats via yum (EPEL repo) or apt-get. The configuration file typically lives in /etc/awstats/awstats.yourdomain.conf.
There is one setting you must change immediately to avoid slowing down your site:
# DO NOT enable DNSLookup for real-time processing
DNSLookup=0
If you set DNSLookup=1, the Perl script will attempt a reverse DNS lookup for every single IP address in your log file. This generates massive network latency and can time out the script. Keep it off. If you need country stats, use the GeoIP plugin instead—it does local lookups against a binary database file.
Step 3: Automation and Performance
Do not run the update script through the web browser (CGI). It creates a timeout risk and exposes your stats generation to the web. Run it via cron.
0 3 * * * /usr/bin/perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=yourdomain.com -update > /dev/null
Running this at 3:00 AM local time (CET) ensures you aren't fighting for resources during peak traffic hours.
Warning: Ensure yourlogrotateconfiguration handles the logs correctly. If Apache rotates the log while AWStats is reading it, you might lose data. Configure AWStats to analyze the archived log before rotation, or use theprerotatehook in/etc/logrotate.d/httpd.
Data Sovereignty: The Norwegian Context
There is a legal angle here too. Under the Norwegian Personal Data Act (Personopplysningsloven) and the guidelines from Datatilsynet, you are responsible for the IP addresses (personopplysninger) you collect.
When you use third-party US-hosted analytics, you are exporting user data across borders. By hosting your own analytics on a CoolVDS server located physically in our Oslo datacenter, you maintain full control. Your logs never leave Norwegian jurisdiction. This is becoming a critical selling point for enterprise clients and government agencies in the Nordics.
Summary: Speed and Control
| Feature | Client-Side (JS) | Server-Side (AWStats) |
|---|---|---|
| Data Source | Browser execution | Raw Server Logs |
| Bot Detection | Poor | Excellent |
| Bandwidth Tracking | Impossible | Precise |
| Resource Usage | Client CPU | Server Disk I/O |
AWStats gives you the granularity that marketing tools miss. It tells you exactly which images are draining your bandwidth and which IP addresses are probing for vulnerabilities (great for configuring ddos protection rules).
However, parsing millions of lines of text requires serious disk throughput. Don't let a slow server obscure your data. Deploy a high-performance, SSD-accelerated instance on CoolVDS today and see what's actually happening on your network.