Stop Grepping Blindly: Professional Log Analysis with AWStats
If I see one more systems administrator trying to debug a traffic spike by running tail -f /var/log/httpd/access_log while the server load average climbs past 20.0, I might just unplug their ethernet cable myself. It is 2012. We have better tools than raw text scrolling faster than the human eye can register.
While Google Analytics is fine for the marketing team, it lies to you. It misses bots, it misses scrapers, and it misses the 404 errors bleeding your bandwidth dry. For the battle-hardened DevOps engineer, the truth lives in the raw logs. But raw logs are heavy. Parsing a 5GB log file on a standard magnetic disk is a recipe for I/O wait hell. This is where AWStats comes in—provided you configure it correctly and host it on hardware that doesn't choke on seek times.
The Architecture of Truth: Why Server-Side Analytics Matter
Client-side trackers (Javascript) fail when the client has NoScript enabled, or when the visitor is a bot. In a recent project for a media client in Oslo, we noticed a discrepancy of 40% between their analytics dashboard and the actual bandwidth usage billed. The culprit? An aggressive scraper from an obscure search engine indexing their site every 3 seconds.
We caught this using AWStats. However, installing it is only half the battle. Tuning it to parse logs without bringing your web server to its knees is the real skill.
Step 1: Installation and Basic Configuration on CentOS 6
We assume you are running a standard EL6 environment (CentOS or RHEL 6). The package is available in the EPEL repository.
rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-5.noarch.rpm
yum install awstats
Once installed, you need to create a configuration file for your specific domain. Do not edit the model file directly; copy it.
cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.coolvds-demo.no.conf
vi /etc/awstats/awstats.coolvds-demo.no.conf
The Crucial LogFormat
This is where 90% of deployments fail. If your Apache LogFormat does not match the AWStats configuration, you will get zero data. Ensure your httpd.conf uses the "combined" format:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
Then, inside your AWStats config, set:
LogFile="/var/log/httpd/access_log"
LogType=W
LogFormat=1 # 1 = Apache Combined
Step 2: Performance Tuning and I/O Bottlenecks
Parsing logs is an I/O intensive operation. On a traditional VPS sharing a RAID-5 SATA array with 50 other noisy neighbors, running the update process can spike your iowait, causing web requests to hang.
Pro Tip: NEVER enable DNS lookups inside AWStats if you value your uptime. Setting DNSLookup=1 forces the script to perform a reverse DNS lookup for every single IP address in the log. If you have 100,000 hits, that is 100,000 DNS queries. Your script will time out, and your DNS resolver will ban you.
Set this in your config:
DNSLookup=0
If you absolutely need hostname resolution, do it offline or use a dedicated log server. This is also why hardware selection matters. At CoolVDS, we utilize high-performance SSD storage arrays (PCIe-based flash technology) for our instances. When parsing a 10GB log file, the difference between a standard 7.2k RPM drive and our SSD tier is the difference between a 40-minute job and a 2-minute job. Low latency storage is not a luxury; it is a requirement for log analysis.
Step 3: Norwegian Compliance (Personopplysningsloven)
Operating in Norway means adhering to the Personopplysningsloven (Personal Data Act). The Datatilsynet (Data Inspectorate) is strict about storing IP addresses, which are considered personal data. You should mask IPs if you do not have a specific technical need to retain them for security auditing.
There is a plugin for this. Enable the geoipfree plugin or write a custom sed wrapper, but the easiest method within AWStats for basic anonymization requires a bit of Perl hacking or simply rotating and deleting raw logs frequently, keeping only the aggregated AWStats data.
Step 4: Automating the Updates
You cannot run this manually every day. Set up a cron job to update the statistics every hour. This keeps the load distributed rather than one massive crunch at midnight.
# /etc/cron.d/awstats
0 * * * * root /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=coolvds-demo.no -update > /dev/null
Handling Log Rotation
When logrotate runs, it renames the current log file. You must ensure AWStats processes the log before it is compressed or deleted. Modify your /etc/logrotate.d/httpd to include a prerotate script:
/var/log/httpd/*log {
missingok
notifempty
sharedscripts
delaycompress
postrotate
/sbin/service httpd reload > /dev/null 2>/dev/null || true
endscript
prerotate
/usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=coolvds-demo.no -update
endscript
}
The Hardware Reality
You can optimize your Perl scripts all day, but if your underlying storage subsystem has high latency, your analysis will be slow. We often see developers blaming Apache for slowness when the reality is that their background logging processes are saturating the disk bandwidth.
If you are serious about data visibility, you need a hosting environment that prioritizes I/O throughput. CoolVDS instances are built on enterprise-grade hardware designed to handle concurrent read/write operations without stealing cycles from your application. Don't let your monitoring tools become the cause of your downtime.
Ready to crunch data without the wait? Deploy a high-performance SSD VPS on CoolVDS and see what your logs have been trying to tell you.