Console Login

Stop Trusting Client-Side Scripts: True Server Log Analysis with AWStats on Linux

The Truth Hiding in Your access_log

If you are relying solely on JavaScript tags to understand your server's traffic, you are flying blind. I've seen it a dozen times: a marketing director claims traffic is down, but the server load says we are pushing 50Mbps constant throughput. The culprit? Hotlinking, scrapers, or users with NoScript enabled. Client-side trackers miss all of this.

In the Norwegian hosting market, where precision and data sovereignty are paramount, relying on third-party US-based trackers is also a liability. With the current climate surrounding the EU Data Protection Directive and the strict enforcement by Datatilsynet, keeping your traffic data on local iron isn't just smart; it's necessary hygiene.

Today, we are going to implement AWStats 7.0 on a CentOS 5 box. We will configure it to parse Apache logs efficiently, secure the Perl scripts (which are notorious for vulnerabilities), and ensure the processing doesn't tank your CPU during peak hours.

Prerequisites and The I/O Bottleneck

Log analysis is I/O heavy. When you run the update script against a 4GB log file, your disk heads are going to scream. On budget VPS providers overselling their SANs, this process can cause iowait to spike above 40%, effectively DoS-ing your own web server.

This is why at CoolVDS, we isolate disk I/O. Whether you are on our high-performance RAID-10 SAS clusters or our new experimental SSD tiers in the Oslo datacenter, we ensure your read operations don't starve your web server's write operations.

Step 1: Installation via EPEL

Don't compile from source unless you enjoy dependency hell. The EPEL (Extra Packages for Enterprise Linux) repository for RHEL/CentOS 5 is stable.

rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-4.noarch.rpm yum install awstats

Step 2: Apache Configuration

AWStats needs a specific log format to extract the most value (user agents, referrers). Open your httpd.conf—usually located at /etc/httpd/conf/httpd.conf—and ensure you are using the combined LogFormat.

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog logs/access_log combined
Pro Tip: If you are hosting multiple vhosts, do not log them all to a single file. Splitting them now saves you hours of grep pain later. Define a separate CustomLog directive inside each <VirtualHost> block.

Step 3: Configuring AWStats

Copy the model config file to a new file named after your domain.

cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.coolvds-demo.no.conf vi /etc/awstats/awstats.coolvds-demo.no.conf

You need to change three critical parameters:

  1. LogFile: Point this to your Apache log. LogFile="/var/log/httpd/access_log"
  2. SiteDomain: Your main domain. SiteDomain="coolvds-demo.no"
  3. DirData: Where the processed stats are stored. DirData="/var/lib/awstats"

Also, enabling DNS lookup (DNSLookup=1) provides better geographical data, showing you exactly how much traffic is coming from Norway versus the rest of Europe. However, this slows down parsing significantly. If your host has slow DNS resolvers, turn this off. (Note: CoolVDS local resolvers in Oslo are optimized for this exact query load).

Step 4: The Cron Job (Automation)

Stats are useless if they are old. We need to parse logs every hour. However, avoid running this at exactly the top of the hour when everyone else's cron jobs trigger. Let's pick a weird time, like 17 minutes past.

17 * * * * /usr/bin/perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=coolvds-demo.no -update > /dev/null

Step 5: Security Hardening

The awstats.pl script is a CGI script. Historically, it has been a vector for remote code execution. Never leave the AWStats interface open to the public internet. You don't want competitors seeing your keywords, and you don't want script kiddies probing for exploits.

Configure an Apache alias with AuthType Basic restriction in /etc/httpd/conf.d/awstats.conf:

Alias /awstatsclasses "/usr/share/awstats/wwwroot/classes/"
Alias /awstatscss "/usr/share/awstats/wwwroot/css/"
Alias /awstatsicons "/usr/share/awstats/wwwroot/icon/"
ScriptAlias /awstats/ "/usr/share/awstats/wwwroot/cgi-bin/"

<Directory "/usr/share/awstats/wwwroot/cgi-bin/">
    DirectoryIndex awstats.pl
    Options ExecCGI
    AllowOverride None
    Order allow,deny
    Allow from all
    
    AuthName "Restricted Access"
    AuthType Basic
    AuthUserFile /etc/awstats/htpasswd.users
    Require valid-user
</Directory>

Generate the password file:

htpasswd -c /etc/awstats/htpasswd.users admin

Why Local Hosting Matters for Logs

When you analyze logs, you are processing IP addresses. Under the Norwegian Personal Data Act (Personopplysningsloven), IP addresses can be considered personal data. Hosting this data on US servers (under the jurisdiction of the Patriot Act) creates a gray area that many Norwegian CTOs prefer to avoid.

By running your metrics stack on a CoolVDS instance in Norway, you keep the data within the EEA, satisfying the rigorous standards of the Datatilsynet. Plus, the latency benefit for your SSH sessions is undeniable. If you are pinging Oslo from London, you are adding 30ms of lag to every keystroke. Pinging from inside Norway? It's instantaneous.

Final Thoughts

AWStats gives you the raw truth. It doesn't care if the user has JavaScript disabled. It doesn't care if the user is a Googlebot. It analyzes the raw server requests. Just make sure your infrastructure can handle the I/O load of parsing gigabytes of text files daily.

Ready to crunch data without the I/O wait? Deploy a CoolVDS instance in our Oslo facility today and keep your logs local, fast, and secure.