Console Login

Stop Flying Blind: Deep Server Forensics with AWStats on CentOS 6

Stop Flying Blind: Deep Server Forensics with AWStats on CentOS 6

If you are relying solely on JavaScript-based trackers like Google Analytics to understand your server's health, you are effectively flying blind. I've seen it time and time again: a client complains about server load, but their "Analytics" dashboard shows a flat line. Why? Because JavaScript tags don't load when a botnet hits your login page, and they certainly don't track bandwidth leeched by hotlinking images.

In the data centers of Oslo and across the Nordic region, we know that true visibility comes from the metal up. It comes from the raw access logs. Today, we are going to deploy AWStats (Advanced Web Statistics) on a CentOS 6 environment. We will configure it to handle Nginx logs, respect Norwegian privacy standards (Datatilsynet), and discuss why your choice of storage I/O makes or breaks this process.

The Gap Between Perception and Reality

Last month, I debugged a Magento deployment that was crawling. The marketing team said traffic was normal. top showed the CPU pinned at 90%. A quick tail -f /var/log/nginx/access.log revealed the truth: a scraper from an unknown subnet was hammering the catalog search every 200ms.

Google Analytics never fired. The server melted anyway.

AWStats parses your server logs directly. It sees everything: 404 errors, 301 redirects, bandwidth usage, and bots. It is the single source of truth for a systems architect.

Step 1: Installation on CentOS 6 / RHEL

First, ensure you have the EPEL repository enabled. Standard repositories are often too conservative with versions.

# Install EPEL if you haven't already rpm -Uvh http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm # Install AWStats yum install awstats

For our Debian/Ubuntu 12.04 LTS friends, a simple apt-get install awstats will suffice.

Step 2: Configuring for Nginx and Apache

AWStats defaults to Apache's combined log format. If you are running Nginx (which you should be for static content performance), you need to ensure your nginx.conf matches what AWStats expects.

Nginx Configuration

Open /etc/nginx/nginx.conf and verify your log_format:

http { log_format combined_custom '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent"'; access_log /var/log/nginx/access.log combined_custom; }

AWStats Configuration

Copy the model config file to a new file for your domain:

cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.yourdomain.com.conf vi /etc/awstats/awstats.yourdomain.com.conf

Modify these key directives:

# Path to your log file LogFile="/var/log/nginx/access.log" # Ensure this matches your server software LogType=W LogFormat=1 # The domain you are analyzing SiteDomain="yourdomain.com" HostAliases="www.yourdomain.com localhost 127.0.0.1" # DNS Lookup - WARNING: Turning this on slows down parsing significantly DNSLookup=1
Pro Tip: If your log files are massive (over 2GB), set DNSLookup=0. Doing reverse DNS lookups for every IP address will strangle your parsing process unless you are on a high-performance network. If you need geo-resolution, use the GeoIP plugin instead of DNS.

Step 3: Privacy and Compliance (The Norwegian Context)

Operating a VPS Norway instance means adhering to the Personal Data Act (Personopplysningsloven). While we wait for EU-wide regulations to tighten, the Datatilsynet (Data Protection Authority) is already strict about storing personally identifiable information (PII) like IP addresses without consent.

You can anonymize IP addresses in AWStats to stay compliant while retaining analytical value. Add this to your config:

# Plugin: GeoIP LoadPlugin="geoip" # Privacy: Mask the last byte of the IP address # This satisfies most basic anonymization requirements # Note: This requires a custom patch or sed processing in 2012-era AWStats versions # Alternative: Use LogFile with a pipe LogFile="/usr/bin/perl /usr/share/awstats/tools/logresolvemerge.pl /var/log/nginx/access.log | sed 's/\.\([0-9]*\) - -/\.0 - -/' |"

This approach ensures that you can see the subnet (ISP/Region) but not the specific user device.

The I/O Bottleneck: Why Hardware Matters

Here is the reality of log analysis: it is disk-intensive. Parsing a 10GB log file involves millions of read operations. On a traditional shared hosting plan with mechanical HDDs (spinning rust), this process can take hours, causing "iowait" to spike and slowing down your actual website database queries.

This is where infrastructure choices become critical. At CoolVDS, we have moved entirely to SSD arrays for our host nodes. The difference in random read performance is not just 2x; it's often 100x compared to standard SATA drives.

Storage Type Throughput AWStats Parse Time (5GB Log)
7.2k RPM SATA HDD ~120 MB/s ~45 Minutes
CoolVDS Enterprise SSD ~500 MB/s+ ~4 Minutes

If you are running complex cron jobs, managed hosting that offers guaranteed IOPS is not a luxury; it is a requirement for stability.

Automating the Update

Finally, we need to update the statistics automatically. Create a cron job inside /etc/cron.d/awstats:

# Update every hour at minute 0 0 * * * * root /usr/share/awstats/tools/awstats_updateall.pl now >/dev/null 2>&1

Secure the reporting interface using htpasswd. You do not want your competitors seeing your traffic sources.

Security Considerations: DDoS Protection

Analyzing logs helps you identify attack vectors. If you see thousands of requests from a single IP targeting xmlrpc.php, you can block them at the firewall level using iptables or csf. However, manual blocking doesn't scale against a large botnet. This is why we recommend placing your server behind ddos protection or ensuring your provider has upstream mitigation. CoolVDS integrates filtering at the NIX (Norwegian Internet Exchange) level to scrub bad traffic before it hits your eth0.

Conclusion

AWStats brings the transparency that slick JavaScript dashboards lack. It tells you the hard truths about your bandwidth, your errors, and your visitors. But remember: logs are heavy. Don't let your monitoring tools become the reason your server slows down.

Need a platform that can crunch 10GB logs without breaking a sweat? Deploy a test instance on CoolVDS today. With our low latency network and pure SSD storage, your analysis finishes before you've even poured your second cup of coffee.