Stop Trusting Javascript: The Hard Truth About Server-Side Analytics with AWStats
If you are relying solely on Google Analytics to tell you what is happening on your servers, you are flying blind. I said it. In 2011, between the rise of NoScript plugins, corporate firewalls stripping referrers, and the absolute flood of search bot traffic, JavaScript-based tracking is showing you maybe 70% of the reality.
To know who is actually hammering your bandwidth—and to catch those nasty hot-linkers draining your resources—you need to go to the source: the Apache Access Logs. But staring at raw text files is for novices. You need AWStats.
However, there is a catch. Parsing gigabytes of log data is an I/O nightmare. I have seen decent servers grind to a halt because a cron job decided to parse a 4GB access_log file on a shared hosting platter drive. Here is how we do it correctly, keeping the Datatilsynet happy and your CPU idle.
The Gap Between Perception and Logs
Last month, a client came to me claiming their Magento installation was underperforming every night at exactly 03:00. They blamed the PHP memory limit. I looked at the graphs. It wasn't memory; it was I/O wait time.
They were running a default AWStats configuration on an oversold VPS container from a budget provider. When the log analysis kicked in, the disk heads were thrashing so hard the database couldn't write session data. We migrated them to a CoolVDS Xen-based instance with RAID-10 SAS storage. The analysis time dropped from 45 minutes to 3 minutes. Hardware matters.
Step 1: Installation on CentOS 5/6
We are not compiling from source today; we have work to do. Ensure you have the EPEL repository enabled, then grab the package.
yum install awstats
Once installed, you need to configure Apache to serve the reports. By default, the config allows access from localhost only (a sane security default). Let's open that up to your management IP range in /etc/httpd/conf.d/awstats.conf.
<Directory "/usr/share/awstats/wwwroot">
Options None
AllowOverride None
Order allow,deny
Allow from 123.45.67.89 # Your Office IP
</Directory>
Step 2: The Configuration Strategy
Copy the model config to your domain specific file:
cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.yourdomain.com.conf
Here are the critical changes you must make in that file. Do not ignore these if you care about accuracy.
- LogFile: Point this to your actual Apache log. usually
/var/log/httpd/access_log. - LogFormat: Set this to
1(Apache Combined). This captures User Agents and Referrers. - DNSLookup: Set this to
0unless you have a local caching DNS server. Doing reverse DNS lookups on every IP in your log file will destroy your performance faster than a fork bomb.
Pro Tip: If you are hosting multiple sites, do not log them all to a single file. Configure Apache VirtualHosts to write to /var/log/httpd/domain_access.log. It makes parsing faster and keeps data isolated.
Step 3: Handling High Load & Automation
The standard way to update stats is running the Perl script manually or via cron:
/usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=yourdomain.com -update
If you have high-traffic sites (100k+ hits/day), this process eats CPU cycles. On OpenVZ containers (common in cheap hosting), this can trigger "failcnt" limits on your privvmpages. This is where the architecture of your VPS matters.
At CoolVDS, we prioritize guaranteed I/O throughput. Because we use Xen virtualization, your memory is hard-allocated, not burstable shared memory that disappears when you need it most. When AWStats loads a 500MB log file into RAM to process unique visitors, a CoolVDS instance doesn't swap. It just works.
Privacy and Local Context (The Norwegian Angle)
We operate in Europe. While the Data Retention Directive is a hot topic, you must respect the Personopplysningsloven (Personal Data Act). IP addresses are considered personally identifiable information in many contexts.
If you are analyzing logs for security (ddos protection, firewall tuning), you are generally in the clear. But for general analytics, consider using the AllowAccessFromWebToAuthenticatedUsersOnly directive in AWStats to ensure these logs aren't public. Datatilsynet does not look kindly on open directories exposing user IP habits.
The Verdict
AWStats is powerful, but it is heavy. It reveals the bots that Google Analytics misses and helps you identify bandwidth theft. But it requires a server that can handle the heavy lifting of text processing.
Don't let log analysis slow down your MySQL queries. Deploy your infrastructure on a platform built for heavy workloads.
Need a server that doesn't choke on log rotation? Spin up a CoolVDS High-Performance instance in Oslo today.