Stop Guessing: Deep Server Log Analysis with AWStats on Linux
Let’s be honest: Google Analytics is for the marketing department. It tells you which keywords converted or how long a user hovered over a banner. But as system administrators, it lies to us by omission. It doesn't track the scrapers hammering your login page, the hotlinkers stealing your bandwidth for their MySpace profiles, or the search engine bots indexing your staging environment.
If you want the truth, you have to go to the source: the access.log. But reading raw text files is a waste of billable hours. This is where AWStats (Advanced Web Statistics) comes in. While the marketing team looks at pie charts, we look at HTTP status codes and bandwidth consumption.
Here is how to deploy AWStats on a standard LAMP stack, optimize it for heavy loads, and keep it compliant with Norwegian privacy standards.
The Prerequisites: Don't Melt Your Disk
Before we install anything, check your hardware. Parsing log files is an I/O-intensive operation. If you are running a high-traffic site on cheap shared hosting or a budget VPS with a single SATA drive, running AWStats can push your load average through the roof, causing iowait that slows down your actual web server.
Pro Tip: Never parse logs on the live production drive during peak hours. At CoolVDS, we provision our Xen-based VPS instances with RAID-10 SAS storage specifically to handle high I/O operations like log rotation and analysis without stalling the CPU.
Step 1: Installation on CentOS 5 / RHEL
The easiest route is using the RPMForge repository, as the default CentOS repositories are often outdated.
rpm -Uhv http://apt.sw.be/redhat/el5/en/i386/rpmforge/RPMS/rpmforge-release-0.3.6-1.el5.rf.i386.rpm
yum install awstats
For our Debian Lenny (5.0) users, it is in the main aptitude repos:
apt-get install awstats
Step 2: Configuring Apache for Accuracy
AWStats is only as good as the data you feed it. The standard Apache common log format is insufficient. You need the combined format to track User-Agents and Referrers.
Open your httpd.conf (usually in /etc/httpd/conf/) and ensure this line is active:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog /var/log/httpd/access_log combined
If you are serving Norwegian customers, latency matters. Ensure your DNS lookup settings in Apache (HostnameLookups Off) are disabled to save milliseconds on every request. Let AWStats handle the DNS resolution later during the reporting phase.
Step 3: The AWStats Configuration
Copy the model config file to create a specific config for your domain:
cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.www.yourdomain.no.conf
Edit the file with your favorite editor (vi or nano). Pay attention to these directives:
- LogFile: Point this to your Apache log path.
- LogType: Set to
Wfor web. - LogFormat: Set to
1for the 'combined' Apache format we set up earlier. - SiteDomain:
www.yourdomain.no - DNSLookup: Set to
1if you want to resolve IPs to hostnames (e.g., seeing*.telenor.netinstead of raw IPs), but be warned—this slows down the update process significantly.
Step 4: Automation vs. Performance
Do not run the update script via the browser (CGI). It’s a security risk and can time out on large log files. Instead, use a cron job to update the statistics every few hours.
Open your crontab:
crontab -e
Add the following line to update every 6 hours:
0 */6 * * * /usr/bin/perl /var/www/awstats/awstats.pl -config=www.yourdomain.no -update > /dev/null
The Norwegian Context: Privacy and Compliance
Operating in Norway means respecting Personopplysningsloven (Personal Data Act). IP addresses can be considered personal data by Datatilsynet if they can be linked to a specific individual. Storing raw access logs indefinitely poses a compliance risk.
To mitigate this, you should implement log rotation that compresses and archives logs, eventually deleting them after a set retention period (e.g., 3 to 6 months). In your AWStats config, you can also use the Not Page List plugin or IP masking features if you need to anonymize the data for broader reporting.
Why Infrastructure Matters
Analyzing 2GB of daily logs requires serious throughput. Many "cheap" VPS providers oversell their resources using OpenVZ, meaning your "guaranteed" RAM is actually burstable memory shared with 50 other tenants. When you run a Perl script like AWStats, the kernel might kill your process if the node is busy.
At CoolVDS, we refuse to play that game. We use Xen virtualization to ensure strict hardware isolation. When you buy 512MB of RAM, you get 512MB of RAM. Our storage arrays are built on enterprise SAS drives optimized for the random Read/Write patterns typical of log analysis and database transactions.
Whether you are hosting a Magento store targeting Oslo shoppers or a high-traffic forum, you need visibility without the performance penalty. Don't fly blind.
Ready to take control of your server data? Deploy a Xen-based instance on CoolVDS today and see what is really hitting your network.