Stop Guessing: Advanced Server Log Forensics with AWStats on Linux
It is 3:00 AM. Your pager is buzzing because the load average on your primary web node just hit 25.0. Is it a DDoS attack? A crawler gone rogue? Or did marketing launch a campaign without telling IT? If you are staring at a blinking cursor running tail -f /var/log/httpd/access_log, you have already lost the battle.
Raw logs are the source of truth, but they are dense and difficult to parse in real-time. In the Norwegian hosting market, where latency to the Norwegian Internet Exchange (NIX) in Oslo is measured in single-digit milliseconds, you cannot afford to waste cycles grepping through gigabytes of text files. You need visualization, and you need it yesterday.
Enter AWStats. While many are moving toward heavyweight solutions, AWStats remains the battle-tested, Perl-based standard for turning raw server logs into actionable intelligence. However, default installations are often resource-hogs that can stall your disk I/O. Here is how to configure it correctly on a production VPS.
The Hidden Cost of Log Analysis: I/O Wait
Before we touch the config files, we need to address the elephant in the server room: Disk I/O. Parsing a 2GB log file requires massive read operations. On traditional budget VPS hosting, which often relies on shared SATA spinning platters (7200 RPM if you are lucky), running an AWStats update process can cause your iowait to spike.
Pro Tip: Never run log analysis during peak traffic hours unless you are on solid-state storage. If you are hosting on CoolVDS, our KVM instances are backed by enterprise-grade SSD RAID arrays. This means you can crunch months of log data in seconds without causing the "noisy neighbor" effect or slowing down your MySQL queries.
Step 1: Installation (CentOS 6 & Ubuntu 12.04)
Let's get our hands dirty. We assume you have root access. If you are using EPEL repositories on RHEL/CentOS:
yum install awstats
For our Debian/Ubuntu 12.04 LTS brethren:
apt-get install awstats
Step 2: Configuring the Log Format
The most common failure point is a mismatch between your web server's log format and what AWStats expects. Apache's "Combined" format is the industry standard.
Open your Apache configuration (usually /etc/httpd/conf/httpd.conf or /etc/apache2/apache2.conf) and verify the LogFormat:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
Now, edit your AWStats configuration file. It is typically located at /etc/awstats/awstats.yourdomain.conf. If it doesn't exist, copy the model file.
cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.coolvds.test.conf
nano /etc/awstats/awstats.coolvds.test.conf
Find the LogFormat directive. Since we are using Apache Combined, set it to 1. If you are using Nginx (which is gaining serious traction this year for high-concurrency setups), you might need to adjust the log string manually.
# 1 - Apache or Nginx Combined Log Format
LogFormat=1
# Point to your actual log file
LogFile="/var/log/httpd/access_log"
# DNS Lookup might slow down parsing. Turn it off if you don't need hostname resolution.
DNSLookup=0
Step 3: Handling Norwegian Privacy Laws (Datatilsynet)
Operating in Norway comes with strict responsibilities under the Personal Data Act (Personopplysningsloven). Storing full IP addresses can be a liability. You should anonymize IPs in your analytics to stay compliant and keep your users' trust.
While AWStats doesn't mask IPs by default in older versions, we can use a plugin or a pre-processing script. However, a simpler immediate step is to restrict who can see the AWStats interface. Do not leave your stats folder open to the public web.
# inside your Apache vhost configuration
<Directory /usr/share/awstats/wwwroot>
Order deny,allow
Deny from all
# Only allow your office IP
Allow from 80.202.x.x
AuthType Basic
AuthName "Server Stats"
AuthUserFile /etc/awstats/.htpasswd
Require valid-user
</Directory>
Step 4: Automation and Performance Tuning
Manually updating stats is for amateurs. We need a cron job. But beware: running this every hour on a high-traffic site can degrade performance if your I/O throughput is low.
# /etc/cron.d/awstats
0 * * * * root /usr/share/awstats/tools/awstats_updateall.pl now > /dev/null
The "CoolVDS" Factor: Why Architecture Matters
When you run awstats_updateall.pl, the script reads every line of your log file since the last update. On a site with 100,000 hits a day, that is a lot of read operations. If you are on a legacy VPS provider overselling their RAM and disk slots, your website will lag while this script runs.
We built CoolVDS specifically to solve this bottleneck. By using pure SSD storage and strict KVM isolation, we ensure that disk I/O is dedicated to your instance. You can parse gigabytes of logs in the background while your front-end Nginx workers continue serving content with sub-millisecond latency.
Advanced: Analyzing Bot Traffic
One of the best features of AWStats is the "Robots/Spiders visitors" section. In 2012, we are seeing a massive increase in aggressive scraping bots. By identifying these User-Agents in AWStats, you can block them at the server level to save bandwidth.
If you see a suspicious User-Agent consuming 40% of your bandwidth, drop it in your .htaccess or Nginx config immediately:
# Block bad bot
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BadBot [NC]
RewriteRule .* - [F,L]
Conclusion
Log analysis is not just about pretty charts; it is about forensic visibility into your infrastructure. Whether you are debugging a 500 error or tracking the success of a new deployment, AWStats remains an essential tool in the Linux admin's arsenal.
However, software is only as fast as the hardware it runs on. Don't let slow rotational disks blind you to what is happening on your server. Experience the difference of SSD-backed KVM virtualization.
Ready to analyze logs without the lag? Deploy a CoolVDS SSD instance today and get full root access in under 60 seconds.