Console Login
Home / Blog / Server Administration / Stop Grepping: Visualizing Server Traffic with AWStats on CentOS & Debian
Server Administration 8 views

Stop Grepping: Visualizing Server Traffic with AWStats on CentOS & Debian

@

You Can't Manage What You Can't Measure

I still see system administrators staring at tail -f /var/log/apache2/access.log like they can read the Matrix code. While I respect the command line prowess, trying to diagnose a traffic spike or analyze bandwidth theft by scrolling through raw text is a waste of time. Worse, relying solely on client-side tools like Google Analytics leaves you blind to the bots, scrapers, and hotlinkers that are actually consuming your bandwidth and CPU cycles.

Client-side JavaScript tracking misses about 30% of the picture. It doesn't trigger for non-JS clients, it's often blocked by privacy plugins (yes, even in 2011, AdBlock Plus is a thing), and it tells you nothing about server errors. You need server-side analysis.

Enter AWStats (Advanced Web Statistics). It’s not new, it’s not flashy, but it parses log files with brutal efficiency. In this guide, we are going to set up AWStats on a standard LAMP stack, configure it for the strict privacy standards of the Norwegian Datatilsynet, and ensure it doesn't kill your server performance during updates.

The Hidden Cost of Log Analysis: I/O Wait

Before we touch the config files, a warning from the trenches. Last month, I was called into a project where a scheduled AWStats update script brought a Magento store to its knees every four hours. The reason? The access logs were massive (GBs in size), and the server was running on cheap, oversold shared hosting with slow SATA drives.

When AWStats parses a 2GB text file, it hammers your disk I/O. If your hosting provider is stuffing 50 tenants onto a single hard drive, your site will hang. This is why at CoolVDS, we prioritize disk throughput on our RAID-10 arrays. You need high IOPS (Input/Output Operations Per Second) to crunch logs without starving the web server.

Step 1: Installation

Let's assume you are running a standard CentOS 5.5 or Debian 6.0 (Squeeze) environment.

For CentOS/RHEL (requires EPEL repository):

yum install awstats

For Debian/Ubuntu:

apt-get install awstats

Step 2: Configuration for Accuracy and Privacy

The default configuration is rarely production-ready. We need to tweak it to handle your specific domain and, crucially, to respect user privacy in accordance with Norwegian regulations (Personopplysningsloven). While we don't have the same strict rules as Germany yet, it is best practice to anonymize IP addresses if you are storing data long-term.

Edit your config file, usually found at /etc/awstats/awstats.yourdomain.conf:

# The path to your log file. crucial!
LogFile="/var/log/apache2/access.log"

# Ensure this matches your Apache LogFormat
LogFormat=1

# The domain you are analyzing
SiteDomain="www.yourdomain.no"

# DNS Lookup: Turn this OFF for performance. 
# Resolving every IP to a hostname will kill your update time.
DNSLookup=0
Pro Tip: If you are serving a Norwegian audience, set your plugin to handle geo-location so you can see traffic from Oslo vs. Bergen. Enable the LoadPlugin="geoip" directive (requires Geo::IP Perl module).

Step 3: Secure Your Stats

AWStats generates static HTML pages or uses a CGI script. Do not leave this open to the public. Competitors can use your traffic data to reverse-engineer your marketing strategy. Lock it down using Apache's .htaccess.

<Directory /usr/lib/cgi-bin/awstats>
  AuthName "Server Stats - Admin Only"
  AuthType Basic
  AuthUserFile /etc/apache2/.htpasswd
  require valid-user
</Directory>

Step 4: Automating the Crunch

Stats are useless if they aren't up to date. However, you don't want to run the update process during peak traffic hours. Configure `cron` to run the update script during the night or early morning.

Edit your crontab with crontab -e:

0 4 * * * /usr/lib/cgi-bin/awstats.pl -config=yourdomain.no -update > /dev/null

This runs the update at 4:00 AM. If you are on a CoolVDS instance, our dedicated CPU resources mean this process will likely finish in seconds. On lesser hardware, you might want to prepend nice -n 19 to the command to lower its CPU priority, ensuring it doesn't interrupt web requests.

Analysis: What to Look For

Once your data is populating, look immediately at the "Robots/Spiders visitors" section. You will often find that 40% of your bandwidth is being consumed by aggressive scrapers or search engines indexing pages that shouldn't exist.

If you see a high number of "404 Not Found" errors from a specific IP, block them in your firewall (iptables) immediately. You can't get this level of forensic detail from Google Analytics.

The Infrastructure Reality

Log analysis is heavy lifting. It requires parsing millions of lines of text, correlating dates, and writing database stats. It exposes the weakness of your storage subsystem.

Many "budget" VPS providers in Norway use network-attached storage (NAS) which adds latency. When AWStats tries to read 500MB of logs, the network chokes. At CoolVDS, we use local storage with hardware RAID controllers. This ensures that when you need to crunch the numbers, the data moves as fast as the physics of the drive allow. Don't let your monitoring tools become the cause of your downtime.

Ready to get serious about server visibility? Deploy a high-performance Xen or KVM instance on CoolVDS today and see what your traffic actually looks like.

/// TAGS

/// RELATED POSTS

Surviving the Spike: High-Performance E-commerce Hosting Architecture for 2012

Is your Magento store ready for the holiday rush? We break down the Nginx, Varnish, and SSD tuning s...

Read More →

Automate or Die: Bulletproof Remote Backups with Rsync on CentOS 6

RAID is not a backup. Don't let a typo destroy your database. Learn how to set up automated, increme...

Read More →

Nginx as a Reverse Proxy: Stop Letting Apache Kill Your Server Load

Is your LAMP stack choking on traffic? Learn how to deploy Nginx as a high-performance reverse proxy...

Read More →

Apache vs Lighttpd in 2012: Squeezing Performance from Your Norway VPS

Is Apache's memory bloat killing your server? We benchmark the industry standard against the lightwe...

Read More →

Stop Guessing: Precision Server Monitoring with Munin & Nagios on CentOS 6

Is your server going down at 3 AM? Stop reactive fire-fighting. We detail the exact Nagios and Munin...

Read More →

The Sysadmin’s Guide to Bulletproof Automated Backups (2012 Edition)

RAID 10 is not a backup strategy. In this guide, we cover scripting rsync, rotating MySQL dumps, and...

Read More →
← Back to All Posts