Console Login
Home / Blog / Server Administration / The Truth is in the Logs: Deep Analysis with AWStats
Server Administration 9 views

The Truth is in the Logs: Deep Analysis with AWStats

@

Stop Trusting Javascript: Why You Need Server-Side Analysis

If you are relying solely on Google Analytics to judge your server's health, you are flying blind. I've seen it a hundred times: a marketing manager claims traffic is down, but the load average on the server is spiking through the roof. Why? Because Javascript tags don't fire for bots, scrapers, hotlinkers, or users with NoScript installed.

The only source of truth is the raw access log. In the battle for uptime, AWStats remains the weapon of choice for serious systems administrators who need to parse gigabytes of Apache or Nginx data into actionable intelligence. But running it on an underpowered box is a recipe for disaster.

The Anatomy of a Log File

Before we install anything, look at your httpd.conf. If you are running a standard CentOS 5 stack, you are likely using the Combined Log Format. This is non-negotiable for meaningful analysis.

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

This captures the Referer (who sent them) and the User-Agent (what browser or bot they are using). Without this, you cannot distinguish between a legitimate customer from Oslo and a scraper bot hammering your login page.

Deploying AWStats on RHEL/CentOS

Don't compile from source unless you enjoy dependency hell. Use the EPEL repository.

yum install awstats
cd /etc/awstats
cp awstats.model.conf awstats.yourdomain.conf

Open that configuration file. The most critical directive is LogFile. If you are rotating logs daily (and you should be, to save disk space), you need to point AWStats to the correct path, often handling wildcards if you parse archived logs.

Pro Tip: Never run log analysis on the live request thread. I once saw a junior admin hook a log parser into a CGI script. It took the site down in 10 minutes. Run your updates via crontab at 3 AM local time.

The "War Story": The Invisible Bandwidth Thief

Last month, a client hosting a media-heavy site on a generic shared host complained about sluggish performance. Their "analytics" showed 500 visitors a day. The server logs told a different story.

I fired up AWStats and sorted by "FileType". A specific .wmv video file was consuming 80% of their bandwidth. Digging into the "Connect to site from" section, we found a forum in Eastern Europe was hotlinking the video directly. We blocked the referer in .htaccess and the server load dropped from 4.0 to 0.2 instantly. Javascript analytics never saw those requests. AWStats did.

Privacy and The Norwegian Context (Datatilsynet)

Operating in Norway means respecting the Personopplysningsloven (Personal Data Act). While IP addresses are technical data, the Data Protection Authority (Datatilsynet) has strict views on storing user data indefinitely.

You can configure AWStats to anonymize the last byte of the IP address inside the config file. This maintains enough granularity for geo-location (identifying if traffic is coming from Telenor or NextGenTel subnets) without storing the exact identity of the user.

# Plugin to enable anonymization
LoadPlugin="geoipfree"

Compliance isn't just for lawyers; it's part of systems architecture.

The Hardware Reality: Why Virtualization Matters

Here is the ugly truth about log analysis: it is I/O intensive. Reading a 2GB log file requires high sustained read speeds. On a cheap Shared Hosting plan or an oversold OpenVZ container, your "steal time" (CPU wait) will skyrocket while the disk thrashes. This makes your actual website slow while you are just trying to read the logs.

This is where CoolVDS takes a different approach. We utilize Xen HVM virtualization. This means your RAM is reserved, and your I/O throughput isn't fighting with 500 other neighbors. When you grep a 5GB log file on a CoolVDS instance, you get the raw power of the underlying RAID-10 SAS arrays.

Performance Comparison: Log Parsing

Environment 1GB Log Parse Time Impact on Web Server
Shared Hosting Timeout / Failed High Latency
Cheap VPS (OpenVZ) 45 Seconds Moderate Jitter
CoolVDS (Xen HVM) 12 Seconds Zero Impact

Automating the Process

Once configured, set it and forget it. Add this to your cron:

0 3 * * * /usr/bin/perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=yourdomain -update > /dev/null

By morning, you'll have a fresh report waiting. You will see the bots, the 404 errors, and the bandwidth drains that Google misses.

If you are tired of wondering why your server is slow, stop guessing. Get raw access to your resources and your logs. Deploy a CoolVDS instance today and see what's actually hitting your network interface.

/// TAGS

/// RELATED POSTS

Surviving the Spike: High-Performance E-commerce Hosting Architecture for 2012

Is your Magento store ready for the holiday rush? We break down the Nginx, Varnish, and SSD tuning s...

Read More →

Automate or Die: Bulletproof Remote Backups with Rsync on CentOS 6

RAID is not a backup. Don't let a typo destroy your database. Learn how to set up automated, increme...

Read More →

Nginx as a Reverse Proxy: Stop Letting Apache Kill Your Server Load

Is your LAMP stack choking on traffic? Learn how to deploy Nginx as a high-performance reverse proxy...

Read More →

Apache vs Lighttpd in 2012: Squeezing Performance from Your Norway VPS

Is Apache's memory bloat killing your server? We benchmark the industry standard against the lightwe...

Read More →

Stop Guessing: Precision Server Monitoring with Munin & Nagios on CentOS 6

Is your server going down at 3 AM? Stop reactive fire-fighting. We detail the exact Nagios and Munin...

Read More →

The Sysadmin’s Guide to Bulletproof Automated Backups (2012 Edition)

RAID 10 is not a backup strategy. In this guide, we cover scripting rsync, rotating MySQL dumps, and...

Read More →
← Back to All Posts