Stop Trusting Javascript: High-Performance Server Log Analysis with AWStats
Let's be honest: Google Analytics is lying to you. It's not malicious, but it is blind. If a user has NoScript enabled, if a bot is hammering your login page, or if a competitor is hotlinking your images, your fancy JavaScript dashboard shows absolutely nothing. As a system administrator, you cannot afford to manage what you cannot measure.
I recently audited a high-traffic e-commerce site hosting in Oslo. Their marketing team claimed traffic was stable, yet the server load was spiking to 15.00 every night. Why? An aggressive scraper from an offshore IP range was indexing their catalog. Google Analytics showed zero hits. The raw Apache logs showed thousands of requests per second.
Today, we are going back to basics. We are deploying AWStats on a LAMP stack. We will configure it to parse gigabytes of logs without choking your CPU, and we will discuss why hosting this data in Norway matters for compliance with Datatilsynet.
Why Server-Side Logging is Mandatory
Client-side analytics (JS) and Server-side analytics (Logs) serve different masters. You need AWStats because:
- Bandwidth Theft: Detect who is hotlinking your heavy assets.
- Status Codes: See 404 errors and 500 server faults that never load the JS tracker.
- Bot Traffic: Distinguish between Googlebot (good) and site-scrapers (bad).
- Privacy & Compliance: Under the Norwegian Personal Data Act (Personopplysningsloven), you are responsible for the IP addresses stored in your logs. Keeping this data on a local server rather than exporting it to a US cloud ensures tighter control.
Step 1: Installation on Ubuntu 12.04 LTS
While CentOS 6 is a rock-solid choice for enterprise, many of you are running Ubuntu 12.04 LTS (Precise Pangolin) for its newer package repositories. Let's get the packages installed.
sudo apt-get update
sudo apt-get install awstats libgeo-ip-perl
The libgeo-ip-perl module is critical. Without it, you won't be able to resolve IP addresses to countries, rendering your geographic analysis useless.
Step 2: Configuring Apache for Analysis
AWStats parses your log files. If your Apache configuration is set to a minimal logging level to save disk space, AWStats will starve. We need the "Combined" log format.
Open your /etc/apache2/apache2.conf or your specific vhost file:
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog /var/log/apache2/access.log combined
Pro Tip: If you are running a high-traffic site, log rotation is not optional. Ensurelogrotateis configured to rotate these files daily, or your disk usage will hit 100% within weeks. I've seen servers crash because/var/logfilled up the root partition. Don't be that guy.
Step 3: Configuring AWStats
Copy the default configuration to a new file for your domain:
sudo cp /etc/awstats/awstats.conf /etc/awstats/awstats.yourdomain.com.conf
sudo nano /etc/awstats/awstats.yourdomain.com.conf
You need to change these specific directives:
# The path to your log file
LogFile="/var/log/apache2/access.log"
# The type of log. 1 is for Web logs.
LogType=W
# The format. 1 is for Apache Combined (NCSA combined)
LogFormat=1
# Your domain name
SiteDomain="yourdomain.com"
HostAliases="www.yourdomain.com localhost 127.0.0.1"
# DNS Lookup. WARNING: Setting this to 1 kills performance.
# Keep it at 0 and use the GeoIP plugin instead.
DNSLookup=0
The I/O Bottleneck
Here is the reality check. When AWStats parses a 2GB log file, it is an I/O intensive operation. It reads thousands of lines per second and writes to its internal database. On a standard VPS with shared mechanical hard drives, this process causes iowait to skyrocket. Your web server will become sluggish while the stats update.
This is where infrastructure choice matters. At CoolVDS, we utilize high-performance storage arrays and strict KVM isolation. Unlike OpenVZ containers where a neighbor's heavy disk usage can kill your performance, our KVM instances offer dedicated resource allocation. If you are parsing logs for a busy site, you need the I/O throughput that CoolVDS provides, or you need to schedule your cron jobs for 4:00 AM.
Step 4: Scheduling the Updates
Manual updates are for amateurs. Set up a cron job to update your statistics every hour.
# /etc/cron.d/awstats
0 * * * * www-data /usr/lib/cgi-bin/awstats.pl -config=yourdomain.com -update > /dev/null
Make sure the script runs as a user with read permissions on the log file (usually www-data or root).
Step 5: Securing the Interface
The AWStats interface is a perl script located in your cgi-bin. Do not leave this open to the public. You don't want your competitors seeing your traffic sources.
In your Apache vhost configuration:
<Directory /usr/lib/cgi-bin/>
<Files "awstats.pl">
AuthType Basic
AuthName "AWStats Access"
AuthUserFile /etc/apache2/.htpasswd
Require valid-user
</Files>
</Directory>
Privacy and The "Datatilsynet" Factor
Operating in Norway means respecting privacy. IP addresses are considered personal data. If you are storing logs indefinitely, you are technically building a database of user behavior.
You should configure AWStats to obfuscate IP addresses if you do not strictly need them for security auditing. However, for security (detecting DDoS attacks), you often need the full IP. The compromise is data retention policies. Ensure your old logs are purged automatically.
Hosting with a Norwegian provider like CoolVDS simplifies this. Your data stays within legal jurisdiction, ensuring you aren't accidentally subject to the USA PATRIOT Act strictures that apply to US-hosted servers.
Conclusion
Google Analytics gives you trends; AWStats gives you truth. But truth requires resources. Parsing logs is heavy work. If you are tired of your server crawling every time a cron job runs, it's time to look at your underlying hardware.
Stop fighting for disk I/O. Deploy your next monitoring stack on a CoolVDS KVM instance. We offer the low latency and disk throughput required for serious system administration. Spin up a server in Oslo today.