Automating Website Monitoring

Automating Website Monitoring

Hey there! Ever found yourself constantly refreshing a website to check if it’s up or if it’s changed? It gets old fast, doesn’t it? Luckily, with Python, you can automate all that tedious monitoring — whether you want to track uptime, watch for content changes, or get alerts when something isn’t right. Let’s dive into how you can build your own website monitoring tools with just a few lines of code.

Why Automate Website Monitoring?

Doing manual checks is slow, error-prone, and doesn’t scale. What if you need to monitor dozens or hundreds of sites? Automation lets you: - Check website availability regularly. - Detect changes in content or structure. - Receive instant notifications when issues arise. - Gather performance metrics over time.

It’s a must-have skill for developers, sysadmins, content managers, and curious tech enthusiasts alike.

Getting Started: Basic Uptime Checker

Let’s begin with a simple uptime monitor. We’ll use the requests library to send HTTP requests and check the status code.

First, make sure you have requests installed. You can install it via pip:

pip install requests

Here’s a basic script to check if a website is up:

import requests

def check_uptime(url):
    try:
        response = requests.get(url, timeout=10)
        if response.status_code == 200:
            print(f"{url} is up!")
        else:
            print(f"{url} is down. Status code: {response.status_code}")
    except requests.exceptions.RequestException as e:
        print(f"{url} is unreachable: {e}")

check_uptime("https://example.com")

This script tries to access the given URL. If it gets a 200 status code, the site is up. Otherwise, it logs the error.

But what if you want to run this periodically? Let’s schedule it.

Scheduling Regular Checks

You can use Python’s time module along with a loop to run checks at intervals. Here’s how:

import time
import requests

def monitor_uptime(url, interval=60):
    while True:
        try:
            response = requests.get(url, timeout=10)
            if response.status_code == 200:
                print(f"{url} is up at {time.ctime()}")
            else:
                print(f"{url} is down at {time.ctime()}. Status: {response.status_code}")
        except requests.exceptions.RequestException as e:
            print(f"{url} is unreachable at {time.ctime()}: {e}")
        time.sleep(interval)

monitor_uptime("https://example.com", interval=300)  # Check every 5 minutes

This will run indefinitely, checking the site every 5 minutes (or whatever interval you set).

Status Count Percentage
Up 45 90%
Down 3 6%
Unreachable 2 4%

But just printing to the console isn’t very useful for long-term monitoring. Let’s make it more practical.

Logging Results

Instead of printing, let’s log results to a file. This way, you have a history of uptime and issues.

import time
import requests
import logging

logging.basicConfig(filename='website_monitor.log', level=logging.INFO,
                    format='%(asctime)s - %(message)s')

def monitor_with_logging(url, interval=60):
    while True:
        try:
            response = requests.get(url, timeout=10)
            if response.status_code == 200:
                logging.info(f"{url} is up.")
            else:
                logging.warning(f"{url} is down. Status: {response.status_code}")
        except requests.exceptions.RequestException as e:
            logging.error(f"{url} is unreachable: {e}")
        time.sleep(interval)

monitor_with_logging("https://example.com")

Now, all events are saved in website_monitor.log, with timestamps and severity levels.

  • Info: Site is up.
  • Warning: Site returned an unexpected status code.
  • Error: Site is unreachable due to network issues.

Detecting Content Changes

Sometimes, you care not just about uptime, but about whether specific content has changed. For example, you might want to monitor a product page for price drops or a blog for new posts.

You can do this by comparing the current page content with a previously saved version.

Here’s a simple content change detector:

import requests
import hashlib
import time

def get_content_hash(url):
    response = requests.get(url)
    return hashlib.md5(response.content).hexdigest()

def monitor_content(url, interval=300):
    last_hash = get_content_hash(url)
    print(f"Initial hash for {url}: {last_hash}")

    while True:
        time.sleep(interval)
        current_hash = get_content_hash(url)
        if current_hash != last_hash:
            print(f"Content changed at {time.ctime()}! Old hash: {last_hash}, New hash: {current_hash}")
            last_hash = current_hash
        else:
            print(f"No change at {time.ctime()}.")

monitor_content("https://example.com")

This script hashes the entire content of the page. If the hash changes, the content has changed.

Note: This is a basic method. For more complex sites, you might want to parse the HTML and check specific elements.

Sending Alerts

What good is monitoring if you don’t know when something goes wrong? Let’s add alerting.

You can send emails, SMS, or push notifications. Here, we’ll use email via the smtplib library.

First, set up your email credentials (use environment variables for security!):

import smtplib
from email.mime.text import MIMEText

def send_alert(subject, body, to_email, from_email, smtp_server, smtp_port, password):
    msg = MIMEText(body)
    msg['Subject'] = subject
    msg['From'] = from_email
    msg['To'] = to_email

    with smtplib.SMTP_SSL(smtp_server, smtp_port) as server:
        server.login(from_email, password)
        server.sendmail(from_email, [to_email], msg.as_string())

Integrate this into your monitor:

def monitor_with_alert(url, interval=60):
    while True:
        try:
            response = requests.get(url, timeout=10)
            if response.status_code != 200:
                send_alert(
                    subject=f"ALERT: {url} is down",
                    body=f"{url} returned status code {response.status_code}",
                    to_email="you@example.com",
                    from_email="monitor@example.com",
                    smtp_server="smtp.gmail.com",
                    smtp_port=465,
                    password="your_password"
                )
        except requests.exceptions.RequestException as e:
            send_alert(
                subject=f"ALERT: {url} is unreachable",
                body=f"{url} is unreachable: {e}",
                to_email="you@example.com",
                from_email="monitor@example.com",
                smtp_server="smtp.gmail.com",
                smtp_port=465,
                password="your_password"
            )
        time.sleep(interval)

Now you’ll get an email whenever the site is down or unreachable.

Alert Type Trigger Condition Action Taken
Status Alert Non-200 status code Email notification sent
Network Alert Request exception (timeout, etc.) Email notification sent
Content Change Hash mismatch Log entry made

Using APIs for Advanced Monitoring

Some services offer API-based monitoring with more features, like geographic checks or performance metrics. For example, you can use the UptimeRobot API or Pingdom API.

Here’s a quick example using UptimeRobot’s API to add a monitor:

import requests

api_key = "your_uptimerobot_api_key"
url = "https://api.uptimerobot.com/v2/newMonitor"

data = {
    "api_key": api_key,
    "format": "json",
    "type": 1,  # HTTP monitor
    "url": "https://example.com",
    "friendly_name": "Example Monitor"
}

response = requests.post(url, data=data)
print(response.json())

This programmatically adds a monitor to UptimeRobot.

  • Benefits: Historical data, multiple check locations, detailed analytics.
  • Drawbacks: Usually paid for heavy usage.

Building a Multi-URL Monitor

You’ll often need to monitor more than one site. Let’s scale our script.

import time
import requests

websites = [
    "https://google.com",
    "https://github.com",
    "https://stackoverflow.com"
]

def monitor_multiple(urls, interval=60):
    while True:
        for url in urls:
            try:
                response = requests.get(url, timeout=10)
                status = "up" if response.status_code == 200 else "down"
                print(f"{url} is {status} at {time.ctime()}")
            except requests.exceptions.RequestException:
                print(f"{url} is unreachable at {time.ctime()}")
        time.sleep(interval)

monitor_multiple(websites)

You can extend this to log each site separately or send different alerts.

Adding Performance Checks

Beyond just uptime, you might care about performance — how fast your site loads.

You can measure response time:

import time
import requests

def check_response_time(url):
    start = time.time()
    try:
        response = requests.get(url, timeout=10)
        end = time.time()
        response_time = end - start
        print(f"{url} responded in {response_time:.2f}s with status {response.status_code}")
        return response_time
    except requests.exceptions.RequestException as e:
        print(f"{url} failed: {e}")
        return None

check_response_time("https://example.com")

Track this over time to spot performance degradation.

Pro tip: Set thresholds and alert if response time exceeds a limit.

Storing Data in a Database

For long-term monitoring, logging to a file isn’t enough. You’ll want to store results in a database for querying and analysis.

Here’s an example using SQLite:

import sqlite3
import time
import requests

conn = sqlite3.connect('monitoring.db')
c = conn.cursor()

c.execute('''CREATE TABLE IF NOT EXISTS checks
             (url text, timestamp text, status_code integer, response_time real)''')

def log_to_db(url, status_code, response_time):
    timestamp = time.ctime()
    c.execute("INSERT INTO checks VALUES (?, ?, ?, ?)",
              (url, timestamp, status_code, response_time))
    conn.commit()

def monitor_with_db(url, interval=60):
    while True:
        start = time.time()
        try:
            response = requests.get(url, timeout=10)
            end = time.time()
            response_time = end - start
            log_to_db(url, response.status_code, response_time)
        except requests.exceptions.RequestException:
            log_to_db(url, None, None)  # Log failure
        time.sleep(interval)

monitor_with_db("https://example.com")

Now you can run SQL queries to analyze uptime, performance trends, and more.

Metric Average Value Max Value Min Value
Response Time (s) 1.2 4.5 0.8
Uptime Percentage 99.5% 100% 95%
Daily Checks 1440 1440 1410

Deploying Your Monitor

Running the monitor on your local machine isn’t ideal — it stops when you shut down your computer. Instead, deploy it to a cloud server or use a platform like Heroku, PythonAnywhere, or a Raspberry Pi at home.

Here’s how to run it as a background process on a Linux server:

nohup python monitor_script.py &

Or use systemd to run it as a service:

# /etc/systemd/system/website_monitor.service
[Unit]
Description=Website Monitor
After=network.target

[Service]
User=ubuntu
ExecStart=/usr/bin/python3 /path/to/monitor_script.py
Restart=always

[Install]
WantedBy=multi-user.target

Then enable and start it:

sudo systemctl enable website_monitor
sudo systemctl start website_monitor

Now your monitor runs continuously and restarts if it crashes.

Wrapping Up

You’ve now built a robust website monitoring system with Python! From basic uptime checks to content change detection, alerting, and database logging, you have the tools to keep tabs on any site automatically.

Remember: - Always handle exceptions gracefully. - Use environment variables for sensitive data like API keys and passwords. - Consider using established monitoring services for critical applications.

Experiment with these scripts, adapt them to your needs, and never manually refresh a website again!

Happy monitoring!