Console Login
Home / Blog / Tutorials & Guides / Stop Flying Blind: Mastering Server Log Analysis with AWStats on Linux
Tutorials & Guides 7 views

Stop Flying Blind: Mastering Server Log Analysis with AWStats on Linux

@

The Map is Not the Territory

If you are relying solely on JavaScript-based trackers like Google Analytics to understand your infrastructure, you are looking at a mirage. They tell you who visited, but they don't tell you who tried to break in, which image hotlinks are draining your bandwidth, or why your HTTPd process is spawning child workers until the server chokes. I learned this the hard way last month while debugging a client's e-commerce portal targeting the Oslo market. Their frontend metrics looked fine, but latency was spiking to 4000ms. The culprit? A massive scraper botnet hammering a non-existent directory. JavaScript tags never fired, so the marketing team saw nothing. But the /var/log/httpd/access_log told the full, ugly story.

Why AWStats Still Rules in 2010

While Webalizer is faster, it lacks detail. AWStats (Advanced Web Statistics) parses your server logs directly to generate graphical reports. It sees everything: status codes, bandwidth usage by file type, and crucially, the user agents hitting your machine. However, parsing a 5GB text file is a heavy operation. It demands high disk I/O throughput. This is where most cheap VPS providers fail—they oversell the spindle speed of their SATA drives.

On a CoolVDS instance, where we prioritize dedicated I/O resources and utilize enterprise-grade RAID arrays, log parsing doesn't send your load average through the roof. You don't want your monitoring tool to be the reason your site goes down.

Step 1: Configuration Hygiene

Before installing AWStats (yum install awstats on CentOS 5 or apt-get install awstats on Debian Lenny), check your Apache configuration. You need the 'Combined' log format to get the referrer and user-agent data. Standard 'Common' logs are useless for security analysis.

# Inside /etc/httpd/conf/httpd.conf
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog logs/access_log combined

Once you verify the logs are rotating correctly (check /etc/logrotate.d/httpd), configure the AWStats domain config file usually found in /etc/awstats/.

Key Settings for Performance and Privacy

Pro Tip: To comply with the strict Norwegian Personal Data Act (Personopplysningsloven) and satisfy the Datatilsynet requirements, consider anonymizing IP addresses if you store logs for extended periods. Set Plugin="geoipfree" to resolve countries without storing raw IP data if strict compliance is required by your legal team.

Ensure DNSLookup=1 is carefully managed. Enabling reverse DNS lookups provides better reports (resolving .no domains vs .com) but can drastically slow down the update process if your server's DNS resolver is sluggish. On CoolVDS, our local resolvers in the datacenter are optimized for this, but if you are hosting elsewhere, consider setting it to 0 and doing offline processing.

Step 2: Securing the Intelligence

The AWStats interface is a goldmine for competitors. It shows exactly which keywords drive traffic and your most popular files. Do not leave this open to the public web. I've seen too many admins leave /awstats/ accessible to the world. Lock it down using Apache's .htaccess and htpasswd.

# In your apache config for the awstats directory
<Directory /usr/share/awstats/wwwroot>
    AuthName "Server Stats - Authorized Personnel Only"
    AuthType Basic
    AuthUserFile /etc/awstats/htpasswd.users
    Require valid-user
</Directory>

The Hardware Bottleneck: I/O Wait

Here is the reality of parsing logs: it is a read-intensive sequential operation. If you are on a shared host with 500 other users fighting for the same hard drive head, your log update script (usually a cron job running awstats.pl -update) will stall. You'll see your server's wa (Wait I/O) percentage spike in top.

When we built the architecture for CoolVDS, we moved away from the "stuff as many VMs as possible on a drive" model. We use high-performance RAID-10 SAS arrays and low-latency storage backends. This means when your cron job fires at 04:00 AM to parse yesterday's traffic, it finishes in seconds, not hours. For data-heavy applications, raw disk speed is the single biggest factor in system responsiveness.

Automate and Forget

Finally, automate the update. Add a cron job to run every hour so you have near real-time data.

0 * * * * root /usr/lib/cgi-bin/awstats.pl -config=mydomain.no -update > /dev/null

Don't let your server be a black box. Install AWStats, secure it, and host it on infrastructure that can handle the I/O load without flinching.

/// TAGS

/// RELATED POSTS

The Ironclad Mail Server: Postfix Configuration Guide for RHEL/CentOS 6

Stop relying on shared hosting relays. Learn how to configure a battle-hardened Postfix server on Ce...

Read More →

Bulletproof Postfix: Building an Enterprise Mail Gateway on CentOS 6

Stop trusting shared IPs with your business communications. A battle-hardened guide to configuring P...

Read More →

Stop Guessing: Precision Server Log Analysis with AWStats on Linux

Client-side tracking misses 20% of your traffic. Learn how to configure AWStats for granular server-...

Read More →

Build Your Own Secure Tunnel: A Hardened OpenVPN Guide for 2011

Tired of sniffing risks like Firesheep on public networks? Learn how to deploy a rock-solid OpenVPN ...

Read More →

Tunneling Through the Noise: A Hardened OpenVPN Setup on Debian Squeeze

Public WiFi is compromised. PPTP is dead. Learn how to deploy a battle-ready OpenVPN server with 204...

Read More →

Hardened Postfix Configuration: Building a Bulletproof Mail Server in 2011

Stop losing business emails to spam filters. A battle-hardened guide to configuring Postfix, impleme...

Read More →
← Back to All Posts