Stop Flying Blind: Deep Server Forensics with AWStats on CentOS 6
If you are relying solely on JavaScript-based trackers like Google Analytics to understand your server's health, you are effectively flying blind. I've seen it time and time again: a client complains about server load, but their "Analytics" dashboard shows a flat line. Why? Because JavaScript tags don't load when a botnet hits your login page, and they certainly don't track bandwidth leeched by hotlinking images.
In the data centers of Oslo and across the Nordic region, we know that true visibility comes from the metal up. It comes from the raw access logs. Today, we are going to deploy AWStats (Advanced Web Statistics) on a CentOS 6 environment. We will configure it to handle Nginx logs, respect Norwegian privacy standards (Datatilsynet), and discuss why your choice of storage I/O makes or breaks this process.
The Gap Between Perception and Reality
Last month, I debugged a Magento deployment that was crawling. The marketing team said traffic was normal. top showed the CPU pinned at 90%. A quick tail -f /var/log/nginx/access.log revealed the truth: a scraper from an unknown subnet was hammering the catalog search every 200ms.
Google Analytics never fired. The server melted anyway.
AWStats parses your server logs directly. It sees everything: 404 errors, 301 redirects, bandwidth usage, and bots. It is the single source of truth for a systems architect.
Step 1: Installation on CentOS 6 / RHEL
First, ensure you have the EPEL repository enabled. Standard repositories are often too conservative with versions.
# Install EPEL if you haven't already
rpm -Uvh http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
# Install AWStats
yum install awstats
For our Debian/Ubuntu 12.04 LTS friends, a simple apt-get install awstats will suffice.
Step 2: Configuring for Nginx and Apache
AWStats defaults to Apache's combined log format. If you are running Nginx (which you should be for static content performance), you need to ensure your nginx.conf matches what AWStats expects.
Nginx Configuration
Open /etc/nginx/nginx.conf and verify your log_format:
http {
log_format combined_custom '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent"';
access_log /var/log/nginx/access.log combined_custom;
}
AWStats Configuration
Copy the model config file to a new file for your domain:
cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.yourdomain.com.conf
vi /etc/awstats/awstats.yourdomain.com.conf
Modify these key directives:
# Path to your log file
LogFile="/var/log/nginx/access.log"
# Ensure this matches your server software
LogType=W
LogFormat=1
# The domain you are analyzing
SiteDomain="yourdomain.com"
HostAliases="www.yourdomain.com localhost 127.0.0.1"
# DNS Lookup - WARNING: Turning this on slows down parsing significantly
DNSLookup=1
Pro Tip: If your log files are massive (over 2GB), setDNSLookup=0. Doing reverse DNS lookups for every IP address will strangle your parsing process unless you are on a high-performance network. If you need geo-resolution, use theGeoIPplugin instead of DNS.
Step 3: Privacy and Compliance (The Norwegian Context)
Operating a VPS Norway instance means adhering to the Personal Data Act (Personopplysningsloven). While we wait for EU-wide regulations to tighten, the Datatilsynet (Data Protection Authority) is already strict about storing personally identifiable information (PII) like IP addresses without consent.
You can anonymize IP addresses in AWStats to stay compliant while retaining analytical value. Add this to your config:
# Plugin: GeoIP
LoadPlugin="geoip"
# Privacy: Mask the last byte of the IP address
# This satisfies most basic anonymization requirements
# Note: This requires a custom patch or sed processing in 2012-era AWStats versions
# Alternative: Use LogFile with a pipe
LogFile="/usr/bin/perl /usr/share/awstats/tools/logresolvemerge.pl /var/log/nginx/access.log | sed 's/\.\([0-9]*\) - -/\.0 - -/' |"
This approach ensures that you can see the subnet (ISP/Region) but not the specific user device.
The I/O Bottleneck: Why Hardware Matters
Here is the reality of log analysis: it is disk-intensive. Parsing a 10GB log file involves millions of read operations. On a traditional shared hosting plan with mechanical HDDs (spinning rust), this process can take hours, causing "iowait" to spike and slowing down your actual website database queries.
This is where infrastructure choices become critical. At CoolVDS, we have moved entirely to SSD arrays for our host nodes. The difference in random read performance is not just 2x; it's often 100x compared to standard SATA drives.
| Storage Type | Throughput | AWStats Parse Time (5GB Log) |
|---|---|---|
| 7.2k RPM SATA HDD | ~120 MB/s | ~45 Minutes |
| CoolVDS Enterprise SSD | ~500 MB/s+ | ~4 Minutes |
If you are running complex cron jobs, managed hosting that offers guaranteed IOPS is not a luxury; it is a requirement for stability.
Automating the Update
Finally, we need to update the statistics automatically. Create a cron job inside /etc/cron.d/awstats:
# Update every hour at minute 0
0 * * * * root /usr/share/awstats/tools/awstats_updateall.pl now >/dev/null 2>&1
Secure the reporting interface using htpasswd. You do not want your competitors seeing your traffic sources.
Security Considerations: DDoS Protection
Analyzing logs helps you identify attack vectors. If you see thousands of requests from a single IP targeting xmlrpc.php, you can block them at the firewall level using iptables or csf. However, manual blocking doesn't scale against a large botnet. This is why we recommend placing your server behind ddos protection or ensuring your provider has upstream mitigation. CoolVDS integrates filtering at the NIX (Norwegian Internet Exchange) level to scrub bad traffic before it hits your eth0.
Conclusion
AWStats brings the transparency that slick JavaScript dashboards lack. It tells you the hard truths about your bandwidth, your errors, and your visitors. But remember: logs are heavy. Don't let your monitoring tools become the reason your server slows down.
Need a platform that can crunch 10GB logs without breaking a sweat? Deploy a test instance on CoolVDS today. With our low latency network and pure SSD storage, your analysis finishes before you've even poured your second cup of coffee.