Stop Guessing: Mastering Server Log Analysis with AWStats on Linux
If you are relying solely on JavaScript-based trackers like Google Analytics to monitor your server's health, you are flying blind. I've seen it time and again: a marketing manager claims traffic is down, but the top command shows the CPU load average spiking at 15.00 because bots are hammering the login page.
Client-side scripts don't execute when a crawler hits your site, they don't track bandwidth usage effectively, and they certainly don't tell you when your server is throwing 500 Internal Server Errors. For the battle-hardened sysadmin, the truth always lives in /var/log/httpd/access_log.
In this guide, we are going to deploy AWStats (Advanced Web Statistics). It’s the industry standard for parsing Apache and Nginx logs into digestible intelligence. But be warned: parsing gigabytes of text logs requires serious I/O performance, something many budget VPS providers in Europe fail to deliver.
Why AWStats Still Rules in 2011
While newer tools are entering the market, AWStats remains the superior choice for deep packet inspection at the application log level. Unlike Webalizer, which offers only rudimentary graphs, AWStats provides detailed breakdowns of HTTP error codes, robot/spider visits, and search keyphrases.
More importantly for those of us operating out of Norway, server-side logging gives you complete ownership of the data. With the EU Data Protection Directive and Norway's strict Personopplysningsloven enforced by Datatilsynet, keeping your traffic data on local servers rather than shipping it to third-party US servers is a smart compliance play.
The Prerequisites
Before we touch the config, ensure your environment is ready. You need Perl installed. If you are running CentOS 5 or 6, or Debian Squeeze, you are likely good to go.
Step 1: Install AWStats
On a RHEL/CentOS system via EPEL:
yum install awstats
On Debian/Ubuntu:
apt-get install awstats
Configuration: The Devil is in the Details
The default configuration will fail you if your log formats aren't aligned. The most common mistake I see is a mismatch between the Apache LogFormat and the AWStats LogFormat directive.
Ensure your Apache configuration (usually in /etc/httpd/conf/httpd.conf) uses the "combined" format:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
Next, edit your AWStats config file, typically located at /etc/awstats/awstats.yourdomain.conf. Pay attention to these three critical flags:
LogFile="/var/log/httpd/access_log"– Point this to the absolute path of your current log.LogType=W– Tells the parser this is a Web log.DNSLookup=1– This resolves IP addresses to hostnames. Warning: This significantly slows down processing. If your DNS resolver is slow, your update process will crawl.
Pro Tip: If you are managing high-traffic sites, disable DNSLookup (set to 0). The reverse DNS lookups add massive latency to the parsing process. If you must have hostnames, ensure you are running a local caching DNS server to reduce lookup times.
Automation: The Cron Job
Stats are useless if they aren't current. You need a cron job to trigger the Perl script that parses the logs. Open your crontab:
crontab -e
Add the following line to update statistics every hour:
0 * * * * /usr/bin/perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=yourdomain -update > /dev/null
The I/O Bottleneck: Why Hardware Matters
Here is the reality check. When AWStats runs, it reads massive text files line-by-line. If you have a busy e-commerce site generating 2GB of logs per day, the disk I/O (Input/Output) required to parse that text is substantial.
On a standard shared hosting plan or a cheap VPS oversold with OpenVZ, other tenants are stealing your disk cycles. Your log analysis might take 45 minutes to run, spiking your CPU wait time and slowing down your actual website.
This is why at CoolVDS, we don't play games with storage. We utilize enterprise-grade SSD storage arrays. Solid State Drives provide random read/write speeds that are exponentially faster than mechanical SAS drives. In our benchmarks, parsing a 5GB log file on a CoolVDS SSD instance takes seconds, whereas traditional 15k RPM drives choke on the task.
Comparison: Log Parsing Time (500MB Log File)
| Storage Type | Time to Parse | System Load |
|---|---|---|
| Standard SATA (7.2k RPM) | 4m 12s | High (IO Wait) |
| Enterprise SAS (15k RPM) | 1m 45s | Moderate |
| CoolVDS SSD RAID | 0m 22s | Low |
Security: Protect Your Stats
AWStats is a Perl CGI script. Historically, CGI scripts have been vectors for vulnerabilities. Do not leave your stats folder open to the public.
Use an .htaccess file to restrict access to your IP address or require a password:
AuthType Basic
AuthName "Restricted Access"
AuthUserFile /etc/awstats/.htpasswd
Require valid-user
Final Thoughts
Log analysis is not optional for serious administrators. It reveals the bots scraping your content, the 404 errors frustrating your users, and the true bandwidth you are consuming. But remember, logs are heavy. Processing them requires a server architecture that values I/O throughput.
If you are tired of waiting for your server to catch up with your data, it's time to upgrade. CoolVDS offers the low latency and high-speed SSD storage you need to crunch numbers without crashing your site. Don't let slow hardware kill your visibility.