Console Login
Home / Blog / Server Administration / Stop Flying Blind: Deep Server Log Analysis with AWStats on Linux
Server Administration 9 views

Stop Flying Blind: Deep Server Log Analysis with AWStats on Linux

@

Stop Flying Blind: Deep Server Log Analysis with AWStats on Linux

I recently audited a client's server after a catastrophic outage during a modest marketing campaign. The culprit wasn't the traffic itself—it was the monitoring. They were running a heavy log analysis script on an oversold VPS with shared I/O. When the logs hit 2GB, the disk wait time spiked, the web server choked, and the site went dark. It was a classic case of "observer effect": the act of measuring the system killed the system.

If you are running a serious deployment, you cannot rely solely on client-side JavaScript trackers like Google Analytics. They miss 404 errors, hotlinking abuse, and the botnets probing your ports. You need server-side analysis. In 2011, the industry standard for this is AWStats. It is powerful, but if configured poorly on weak hardware, it is a resource hog.

The Truth is in /var/log

Client-side tracking lies. It gets blocked by NoScript, misses mobile browsers with poor JS support, and ignores bandwidth theft. Server logs tell the truth. However, raw Apache logs are unreadable streams of text. AWStats parses these into visual data regarding visitors, duration, and most importantly, HTTP error codes.

Installation on RHEL/CentOS

Assuming you are running CentOS 5.5 or RHEL 6, we will use the RPM Forge repository. Do not compile from source unless you enjoy dependency hell.

# yum --enablerepo=rpmforge install awstats

Once installed, the configuration file is your next stop. Copy the model file to a specific config for your domain:

# cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.yourdomain.com.conf # vi /etc/awstats/awstats.yourdomain.com.conf

Critical Configuration: Performance & Privacy

Here is where the generic tutorials fail you. They tell you to just turn it on. In a production environment, specifically here in Norway where Datatilsynet (The Data Inspectorate) is vigilant about the Personal Data Act, you need to be careful with IP addresses.

1. The Privacy Flag

You should obfuscate IP addresses if you don't strictly need them for security auditing. This keeps you compliant with European privacy directives.

# In awstats.yourdomain.com.conf WarningMessages=1 DNSLookup=1 # Plugin for GeoIP is recommended for accuracy over reverse DNS LoadPlugin="geoip FREE"

2. Log Format and I/O

Ensure your Apache httpd.conf is using the 'Combined' log format. AWStats struggles with custom formats unless you spend hours mapping the fields.

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

The Hidden Cost: Disk I/O

This is the part that kills performance. When awstats.pl runs, it reads massive text files. On a standard shared hosting plan or a budget VPS, your disk I/O is shared with hundreds of other neighbors. If a neighbor decides to compile a kernel while you are parsing logs, your site lags.

This is why we architect CoolVDS differently. We don't oversell our storage subsystems. When you run a disk-intensive operation like parsing a 5GB log file, you need sustained throughput. We use high-performance RAID-10 SAS arrays and Enterprise SSD caching (on select nodes) to ensure your IO wait remains negligible.

Pro Tip: Never run AWStats updates in real-time (as a CGI script). It opens a vector for DoS attacks. Always run it via cron and serve static HTML reports.

Setting the Cron Job

Update your stats hourly, not on every page load.

# crontab -e 0 * * * * /usr/bin/perl /var/www/awstats/awstats.pl -config=yourdomain.com -update > /dev/null

Securing the Output

The AWStats interface reveals sensitive data about your directory structure and backend capabilities. Do not leave this open to the public web. Use an .htaccess file to restrict access to your administration IP range.

# /var/www/awstats/.htaccess AuthType Basic AuthName "Server Statistics" AuthUserFile /etc/awstats/.htpasswd Require valid-user Order deny,allow Deny from all Allow from 123.123.123.123

Why Infrastructure Matters

Analyzing logs effectively requires a balance of CPU power and disk speed. If you are hosting high-traffic sites targeting Norway, latency to the NIX (Norwegian Internet Exchange) is vital, but so is the internal latency of your server's bus. A VPS Norway solution needs to offer more than just an IP address in Oslo; it needs the hardware muscle to crunch data without stalling your MySQL queries.

If you are tired of your server crawling every time you try to figure out who is visiting your site, it might be time to move away from legacy shared platforms. CoolVDS provides the dedicated resources and ddos protection necessary to run comprehensive analytics tools without fear of self-inflicted downtime.

Don't guess where your traffic is coming from. Measure it. Just make sure your server can handle the truth.

/// TAGS

/// RELATED POSTS

Surviving the Spike: High-Performance E-commerce Hosting Architecture for 2012

Is your Magento store ready for the holiday rush? We break down the Nginx, Varnish, and SSD tuning s...

Read More →

Automate or Die: Bulletproof Remote Backups with Rsync on CentOS 6

RAID is not a backup. Don't let a typo destroy your database. Learn how to set up automated, increme...

Read More →

Nginx as a Reverse Proxy: Stop Letting Apache Kill Your Server Load

Is your LAMP stack choking on traffic? Learn how to deploy Nginx as a high-performance reverse proxy...

Read More →

Apache vs Lighttpd in 2012: Squeezing Performance from Your Norway VPS

Is Apache's memory bloat killing your server? We benchmark the industry standard against the lightwe...

Read More →

Stop Guessing: Precision Server Monitoring with Munin & Nagios on CentOS 6

Is your server going down at 3 AM? Stop reactive fire-fighting. We detail the exact Nagios and Munin...

Read More →

The Sysadmin’s Guide to Bulletproof Automated Backups (2012 Edition)

RAID 10 is not a backup strategy. In this guide, we cover scripting rsync, rotating MySQL dumps, and...

Read More →
← Back to All Posts