Stop Grepping Blindly: Visualizing Server Traffic with AWStats on CentOS 5
It is 3:00 AM. Your load average is spiking, and top shows Apache eating CPU cycles like a starving animal. You suspect a botnet or a scrape script gone rogue, but staring at tail -f /var/log/httpd/access_log is like trying to read the Matrix code without the red pill. You need aggregate data, and you need it five minutes ago.
While Google Analytics is fine for marketing teams, it lies to sysadmins. It relies on JavaScript execution. It misses the bots, the hotlinkers, and the 404 errors that are actually grinding your disk to a halt. This is where AWStats (Advanced Web Statistics) comes in. It parses the raw server logs, giving you the truth, the whole truth, and nothing but the truth.
I recently audited a high-traffic e-commerce site based in Oslo. They were suffering from phantom slowdowns. Marketing said traffic was normal; the logs said otherwise. We deployed AWStats and found a scraper from a non-EU IP hitting their search function 50 times per second. We blocked the IP range in iptables, and the load dropped instantly. Here is how to set it up correctly, specifically for a CentOS environment.
The Prerequisites
We are assuming you are running a standard LAMP stack on CentOS 5 or 6. While AWStats is written in Perl, don't let that scare you—it is battle-tested and efficient, provided your I/O subsystem isn't made of wood.
Pro Tip: Log analysis is I/O intensive. If you are parsing gigabytes of logs on a cheap shared host with over-provisioned SATA drives, your server will choke. This is why we build CoolVDS instances on RAID-10 SAS arrays with dedicated I/O throughput. We don't steal your IOPS.
Step 1: Installation
First, enable the EPEL repository if you haven't already. The standard repositories are often too conservative.
rpm -Uvh http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm
yum install awstats
Step 2: Configuration for Accuracy
The default configuration is rarely enough. You need to map it to your specific domain logs. Copy the model file:
cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.yourdomain.com.conf
vi /etc/awstats/awstats.yourdomain.com.conf
Change these critical lines:
LogFile="/var/log/httpd/access_log"(Or wherever your VHOST logs live)SiteDomain="yourdomain.com"DNSLookup=1(Warning: This slows down processing significantly. Only enable if you have local caching DNS or fast upstream resolvers. On CoolVDS, our local resolvers in the Oslo datacenter handle this latency efficiently.)
Step 3: Security and The Norwegian Context
By default, AWStats puts its CGI scripts in a public folder. Do not leave this open. You are exposing internal traffic patterns. In Norway, Datatilsynet (The Data Inspectorate) takes a dim view of leaking IP addresses, which are considered personal data under the Personal Data Act (Personopplysningsloven).
Lock it down in your Apache config:
<Directory "/usr/share/awstats/wwwroot">
Options None
AllowOverride None
Order deny,allow
Deny from all
Allow from 127.0.0.1 10.0.0.0/8
AuthType Basic
AuthName "AWStats Access"
AuthUserFile /etc/awstats/htpasswd
Require valid-user
</Directory>
Step 4: Automation
You do not want to run the update script manually. However, running it every hour on a massive log file can spike your CPU. The balance is a cron job that runs during low-traffic hours.
0 3 * * * /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=yourdomain.com -update > /dev/null
Why Infrastructure Matters
Parsing logs is a sequential read operation. On traditional VPS platforms using OpenVZ with oversold resources, the "steal time" (CPU time stolen by the hypervisor for other tenants) can cause your stats generation to hang, leaving you with gaps in your data.
At CoolVDS, we utilize KVM virtualization. This ensures that the RAM and CPU cycles you pay for are reserved for your log parsing, not your neighbor's WordPress plugin. Furthermore, our datacenter in Oslo ensures that if you are processing Norwegian user data, it stays within national borders, simplifying your compliance with local privacy laws.
Final Thoughts
Logs are the black box of your server. Without a tool like AWStats, you are flying blind. But remember, a tool is only as fast as the hardware it runs on. If you are tired of waiting 20 minutes for a log report to generate, it might be time to upgrade to a platform designed for heavy lifting.
Need consistent I/O for your data analysis? Deploy a KVM instance on CoolVDS today and stop fighting for resources.