Console Login

Stop Grepping: Enterprise Log Analysis with AWStats on CentOS 5

Stop Grepping: Enterprise Log Analysis with AWStats on CentOS 5

If you are still staring at a scrolling terminal running tail -f /var/log/httpd/access_log hoping to spot a traffic trend, you are wasting billable hours. While command-line tools like awk and grep are fundamental for immediate troubleshooting, they fail to provide the historical context needed for capacity planning or marketing analysis.

In the current hosting landscape of 2009, where every megabyte of RAM counts, offloading analytics to a third-party JavaScript tracker is common, but it misses 30% of your traffic—bots, scrapers, and users with NoScript enabled. To see the real traffic hitting your metal, you need server-side log analysis. AWStats (Advanced Web Statistics) remains the industry standard for this, utilizing the power of Perl to parse massive log files into digestible HTML reports.

Pro Tip: Do not run log analysis on the same physical disk as your database. The I/O contention will kill your MySQL query performance. This is why we configure CoolVDS instances with separate virtual block devices for logging partitions whenever possible.

The Hidden Cost of Parsing Logs

Here is the reality check: AWStats is resource-intensive. It is a Perl script that reads text files line-by-line. On a typical oversold shared hosting account, the process scheduler will kill your analysis job before it finishes parsing a 2GB log file. I saw this happen last week with a client running a high-traffic vBulletin forum. Their host terminated the awstats.pl process for "excessive resource usage," leaving them blind during a DDoS attack.

This is where the architecture of your VPS matters. You need guaranteed CPU cycles and high disk throughput. We built CoolVDS on Xen virtualization specifically to avoid this "noisy neighbor" effect. When you run a cron job on our infrastructure, you get the dedicated slice of the CPU you paid for, ensuring your reports are ready by 08:00 AM.

Step 1: Installation on CentOS 5

We will assume you are running a standard LAMP stack on CentOS 5.x. While you can install from source, the EPEL (Extra Packages for Enterprise Linux) repository remains the cleanest method for dependency management.

# Import EPEL key
rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL

# Install AWStats
yum -y install awstats

Step 2: Configuration for Accuracy

The default configuration is rarely sufficient for production environments. You need to create a specific config file for your domain. Copy the model file:

cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.yourdomain.com.conf
vi /etc/awstats/awstats.yourdomain.com.conf

Inside this file, there are three critical directives you must change to match your Apache 2.2 setup:

# Path to your standard Apache log
LogFile="/var/log/httpd/access_log"

# Ensure this matches your LogFormat in httpd.conf
# For standard 'combined' format, use 1
LogFormat=1

# The domain name for the report
SiteDomain="yourdomain.com"

Handling the Nginx Trend

If you are experimenting with Nginx 0.7 as a reverse proxy (a rising trend we are seeing for static content offloading), your Apache logs might only show 127.0.0.1 as the source IP. This renders AWStats useless. You must install mod_rpaf on Apache to restore the real client IP, or configure AWStats to parse the Nginx logs directly.

Step 3: Scheduling and Security

AWStats does not update in real-time. You must schedule the update via cron. However, running this too frequently on a server with slow 7.2k RPM drives will cause I/O wait spikes. On CoolVDS, where we utilize high-performance SAS RAID arrays (and emerging SSD caching tech), you can safely run this hourly.

Add this to /etc/cron.hourly/awstats:

#!/bin/bash
/usr/bin/perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=yourdomain.com -update > /dev/null

Remember to make it executable: chmod +x /etc/cron.hourly/awstats.

Norwegian Context: Privacy and Datatilsynet

Operating a VPS in Norway brings specific legal obligations. Under the Personal Data Act (Personopplysningsloven), IP addresses can be considered personal data. If you are storing raw logs for years without anonymization, you may be in violation of local privacy standards overseen by Datatilsynet.

To mitigate this liability, you can configure AWStats to mask the last octet of IP addresses in the output reports, although the raw logs on disk remain your responsibility to rotate and purge.

# In awstats.yourdomain.com.conf
# Plugin to anonymize IP
LoadPlugin="geoip_city_maxmind"

Note: You will need the GeoIP C library installed for the Perl module to map IPs to locations without storing the raw IP in the report view.

Securing the Interface

Never leave your stats open to the public. Competitors can determine your traffic spikes and marketing strategy. Use an .htaccess file in your cgi-bin directory:

AuthName "Restricted Access"
AuthType Basic
AuthUserFile /var/www/awstats/.htpasswd
Require valid-user

The Performance Trade-off

Log analysis is a trade-off between insight and performance. The more granular the data, the harder your disk heads have to work. If you are seeing high iowait during log rotation, it is a sign your current hosting solution relies on slow SATA storage or overloaded host nodes.

For mission-critical applications where latency to the NIX (Norwegian Internet Exchange) in Oslo is paramount, you cannot afford a file system bottleneck. We designed the CoolVDS platform with enterprise-grade RAID-10 storage to swallow these I/O operations without impacting your web server's response time.

Stop letting log parsing slow down your HTTP requests. Deploy a high-performance CoolVDS instance today and gain visibility without the latency.