Console Login
Home / Blog / Server Administration / Stop Guessing: Analyze True Server Traffic with AWStats on CentOS 5
Server Administration 9 views

Stop Guessing: Analyze True Server Traffic with AWStats on CentOS 5

@

The Truth is in the /var/log

Let’s be honest for a second. That shiny JavaScript analytics dashboard the marketing team loves? It’s lying to you. Between users running NoScript, mobile browsers with limited JS support, and the sheer volume of bot traffic hitting your NIC, client-side tracking gives you maybe 70% of the picture. As a sysadmin, 70% accuracy isn't data; it's a guess.

I recently inherited a "high-traffic" e-commerce setup running on a budget shared host. The client claimed their site was slow, yet their analytics showed a drop in visitors. One look at tail -f /var/log/httpd/access_log revealed the truth: a massive scraper bot was hammering the catalogue pages, bypassing the JS tracker entirely but consuming 90% of the Apache workers. The server was melting, but the dashboard said it was a slow day.

This is why we need AWStats. It parses the raw server logs. It doesn't care if the client has JavaScript enabled. It captures every GET, POST, and HEAD request. Here is how to set it up properly on a CentOS 5 environment, and why your underlying disk I/O matters more than you think.

Prerequisites and Installation

We are assuming you are running a clean CentOS 5.5 or Ubuntu 10.04 LTS instance. You need root access because we are touching global config files. If you are still on shared hosting, stop reading and upgrade to a VPS. You cannot optimize what you cannot configure.

yum install awstats

If it's not in your repo, grab the RPM from the EPEL repository. Once installed, we need to configure the model file. Copy the template:

cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.yourdomain.com.conf

Configuration: The Vital Flags

Open your new config file. Don't just leave the defaults; that's how you get useless data. You need to verify the LogFile path matches your Apache configuration.

LogFile="/var/log/httpd/access_log" LogType=W LogFormat=1 DNSLookup=1
Pro Tip: Set DNSLookup=1 carefully. It resolves IP addresses to hostnames (e.g., seeing .telenor.net instead of raw IPs). However, this adds significant latency to the update process. On a slow network, this will hang the parser. This is where hosting in Norway matters. CoolVDS instances peering directly at NIX (Norwegian Internet Exchange) resolve local hostnames significantly faster than servers routed through Frankfurt.

The I/O Bottleneck

Here is the part most tutorials skip. AWStats is a Perl script that reads massive text files. It is an I/O (Input/Output) heavy operation. If you have a 2GB log file, the script has to read every single byte to generate your report.

On a standard shared host or a budget VPS using OpenVZ with oversold resources, your disk I/O is capped. When you run the update command:

perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=yourdomain.com -update

...you might see your iowait spike in `top`. If your host is using slow 7.2k SATA drives in a crowded RAID array, this parse could take 20 minutes. During that time, your web server performance degrades because the disk heads are thrashing.

This is why we architect CoolVDS differently. We use enterprise-grade SAS 15k RPM drives in RAID-10. We don't oversell the I/O. When you run a log analysis on our Xen-based nodes, the dedicated disk throughput means that 2GB log file gets chewed through in seconds, not minutes. You get the data without the downtime.

Automation and Security

Don't run this manually. Add it to your crontab to update every hour:

0 * * * * perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=yourdomain.com -update > /dev/null

A Note on Norwegian Compliance

If you are hosting data for Norwegian users, remember the Personal Data Act (Personopplysningsloven). IP addresses can be considered personal data. Datatilsynet is strict about where this data lives.

By hosting on a CoolVDS server located physically in Oslo, you simplify your compliance stance under the EU Directive 95/46/EC. You know exactly where the physical hard drives are spinning. Try getting that guarantee from a US-based cloud giant.

Conclusion

Logs are the only source of truth in system administration. They tell you when you are under attack, which images are bandwidth hogs, and exactly where your latency originates. But analyzing them requires hardware that can keep up with the read operations.

Stop settling for "estimated" traffic data. Deploy a high-performance VPS that can handle the heavy lifting.

Need raw power for log parsing? Spin up a CoolVDS Xen instance in Oslo today.

/// TAGS

/// RELATED POSTS

Surviving the Spike: High-Performance E-commerce Hosting Architecture for 2012

Is your Magento store ready for the holiday rush? We break down the Nginx, Varnish, and SSD tuning s...

Read More →

Automate or Die: Bulletproof Remote Backups with Rsync on CentOS 6

RAID is not a backup. Don't let a typo destroy your database. Learn how to set up automated, increme...

Read More →

Nginx as a Reverse Proxy: Stop Letting Apache Kill Your Server Load

Is your LAMP stack choking on traffic? Learn how to deploy Nginx as a high-performance reverse proxy...

Read More →

Apache vs Lighttpd in 2012: Squeezing Performance from Your Norway VPS

Is Apache's memory bloat killing your server? We benchmark the industry standard against the lightwe...

Read More →

Stop Guessing: Precision Server Monitoring with Munin & Nagios on CentOS 6

Is your server going down at 3 AM? Stop reactive fire-fighting. We detail the exact Nagios and Munin...

Read More →

The Sysadmin’s Guide to Bulletproof Automated Backups (2012 Edition)

RAID 10 is not a backup strategy. In this guide, we cover scripting rsync, rotating MySQL dumps, and...

Read More →
← Back to All Posts