Stop Grepping Logs: Visualizing Traffic with AWStats on CentOS
There is a specific kind of headache reserved for system administrators who try to debug a traffic spike using tail -f /var/log/httpd/access.log. While raw logs are the ultimate source of truth, they are terrible for spotting trends. If you are running a high-traffic e-commerce site or a media portal here in Norway, you cannot rely on scrolling text to tell you if your bandwidth bill is about to explode.
You need visualization. In 2011, your best option isn't expensive SaaS—it's AWStats. It is free, robust, and Perl-based. But be warned: log parsing is an I/O killer. I have seen budget VPS instances freeze completely because a sysadmin tried to parse a 2GB log file on a shared disk array. Here is how to do it right, keeping your resources managed and your data strictly within Norwegian borders.
The Reality of Log Parsing and I/O Wait
Before we touch the config, let's talk hardware. AWStats works by reading your server logs line-by-line and building a database of statistics. This is a read-heavy operation.
If you are hosting on a legacy platform with oversold hard drives, running the update script will skyrocket your I/O Wait. Your web server (Apache or Nginx) will start queuing requests because the disk is too busy reading logs to serve static files. This is why we engineer CoolVDS with dedicated RAID 10 arrays and strictly limited density. We ensure that when you crunch numbers, your actual visitors don't stare at a loading screen.
Step 1: Installing AWStats on CentOS 5/6
We will assume you are running a standard LAMP stack (Linux, Apache, MySQL, PHP/Perl). First, ensure you have the EPEL repository enabled, as AWStats isn't in the base CentOS repos.
rpm -Uvh http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm
yum install awstats
Once installed, the configuration files live in /etc/awstats/. You need to create a copy of the model file for your specific domain.
cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.yourdomain.com.conf
vi /etc/awstats/awstats.yourdomain.com.conf
Step 2: Configuration for Accuracy
The default config is okay, but "okay" doesn't solve problems. You need to change a few specific directives to ensure you aren't just logging garbage data.
- LogFile: Point this to your actual Apache log. usually
/var/log/httpd/access_log. - LogFormat: Set this to
1for "Combined" Apache format. If you use Nginx, you must match thelog_formatdirective innginx.confto what AWStats expects. - DNSLookup: Set this to
1. This resolves IP addresses to hostnames. Warning: This slows down processing significantly. On a CoolVDS instance with low latency to local DNS resolvers, this is negligible, but on cheap hosting, this can add hours to your processing time.
Pro Tip: If you are running a cluster behind a load balancer, your logs might show the load balancer's IP instead of the visitor's. Ensure you installmod_rpaffor Apache so theX-Forwarded-Forheader is respected. Otherwise, AWStats will think 100% of your traffic comes from localhost.
Step 3: Secure the Interface
By default, AWStats is accessible via a CGI script. Do not leave this open to the world. Competitors can see your traffic spikes, your referrers, and your keywords. Lock it down using an .htaccess file in your web root or Apache configuration:
<Directory /var/www/awstats/>
Order deny,allow
Deny from all
Allow from 123.45.67.89 # Your Office IP
AuthType Basic
AuthName "Restricted Stats"
AuthUserFile /etc/awstats/htpasswd
Require valid-user
</Directory>
Local Compliance: The Norwegian Context
Hosting in Norway isn't just about speed; it is about the Personal Data Act (Personopplysningsloven). IP addresses can be considered personal data under Datatilsynet guidelines.
When you host with a US-based provider, you are sending log data across the Atlantic. By keeping your VPS in Oslo (like our CoolVDS infrastructure), you simplify compliance. However, you should still configure AWStats to purge raw data after processing. Set PurgeLogFile=1 if you archive logs elsewhere, or use the ArchiveLen directive to limit how long historical data is kept.
Step 4: Automating the Updates
Stats are useless if they are old. Set up a cron job to update the database every hour. Open your crontab with crontab -e:
0 * * * * /usr/bin/perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=yourdomain.com -update > /dev/null
This runs the Perl script at the top of every hour. Monitor your server load the first few times this runs. If you see your load average spike above 2.0, you might need to allocate more RAM or move to a higher tier plan.
Why Infrastructure Matters
Log analysis is basically a stress test for your storage subsystem. If you are on a shared host, the "neighbor effect" means someone else's log processing can slow down your database queries. This is unacceptable for professional environments.
At CoolVDS, we use KVM virtualization to ensure strict resource isolation. We combine this with high-performance RAID storage to handle the heavy read/write operations required by tools like AWStats. Whether you are fighting off a DDoS attack or analyzing the success of a marketing campaign, you need hardware that doesn't blink.
Ready to take control of your data? Don't settle for sluggish I/O. Deploy a high-performance VPS Norway instance on CoolVDS today and see what your logs have been trying to tell you.