How I Built an Automated SEO Audit Tool Using an AI Scraper

Build an automated SEO audit tool that analyses scraped data from Bright Data’s AI Scraper, identifies SEO weaknesses, and generates AI-driven reports.

ByEgop Gogo-Job
Published on

Frequently Asked Questions

Common questions about this topic

What is the Automated SEO Audit Tool described?
The Automated SEO Audit Tool is a Streamlit web app that ingests scraped web page HTML, extracts SEO elements (meta titles, descriptions, headers, links, images, and main text), compiles an audit report with key metrics, and sends that report to a Hugging Face Mistral LLM for enhanced AI recommendations, with downloadable audit reports.
Which scraper was used to collect web page HTML for the audits?
Bright Data’s No-Code AI Scraper (AI Scraper — discover by domain URL) was used to fetch page_html output without writing code and to download the scraped data as JSON, CSV, NDJSON, or JSONL.
How is the scraped HTML passed into the SEO Audit Tool UI?
The Streamlit UI requires uploading a scraped JSON file that contains articles with a 'page_html' field; the app accepts either a list of article objects or a single dictionary containing 'page_html'.
How does the tool resolve relative URLs found in scraped HTML?
The tool extracts a base URL from the HTML by checking for a <base> tag or the first absolute link; it also accepts a user-input base URL in the UI to resolve relative links using urllib.parse.urljoin.
What SEO elements does the parser extract from page HTML?
The parser extracts the <title> content as meta_title, the <meta name="description"> content as meta_description, counts of h1–h6 headers, image src and alt attributes (resolved to absolute URLs), links with resolved URLs and status checks, and the cleaned main text after removing script and style tags.
How does the tool check link status and classify links?
The tool iterates over <a> tags with hrefs, resolves each to an absolute URL, checks HTTP status using requests.head (and requests.get if head returns 4xx/5xx), and classifies links as internal or external based on the base URL.
What metrics are included in the generated audit report?
The audit report includes the page URL, meta_title, meta_description, header counts, image_count, link_count, internal_link_count, external_link_count, broken_link_count, error_link_count, and main_text_length.
Which large language model and API are used for enhanced analysis and recommendations?
The tool uses a Mistral model via the Hugging Face Inference API (example model URL in the guide: mistralai/Mistral-7B-Instruct-v0.2) by sending a prompt that asks the model to act as an SEO expert and analyze the audit report.
How does the app handle responses and errors from the Hugging Face InferenceClient?
The app attempts client.text_generation and, if the response is a string, returns it stripped; if the response is a list with a generated_text field, it returns that text; otherwise it logs an unexpected response and returns a 'Failed to parse Mistral response.' message; exceptions return 'Error connecting to Mistral.'
What dependencies must be installed to run the SEO Audit Tool locally?
Required dependencies listed are streamlit, requests, beautifulsoup4, huggingface_hub, textstat (optional for readability), and mistral_inference; the example pip install command in the guide is: pip install streamlit requests beautifulsoup4 huggingface_hub textstat mistral_inference.
How is the Hugging Face API token handled for local development and deployment?
The Hugging Face API token should be set as an environment variable (e.g., export HUGGINGFACE_TOKEN on macOS/Linux or set HUGGINGFACE_TOKEN on Windows PowerShell) and retrieved in the script with os.environ.get('HUGGINGFACE_TOKEN'), with a check that the token is present before use.
How are audit reports made available for users to download from the app?
The app creates a downloadable .txt report by base64-encoding the report text and returning an HTML anchor tag with a data:file/txt;base64 URL so users can click a 'Download Report' link in the Streamlit UI.
What are the steps to deploy the Streamlit app to Streamlit Community Cloud?
Deployment steps are: host the project in a public GitHub repository, include a requirements.txt and the Streamlit script (e.g., app.py), sign into Streamlit Community Cloud with GitHub, create a new app, specify the repository, branch (main), and main file path, then click Deploy to build and receive a public URL.
What validation does the Streamlit UI perform on the uploaded JSON file?
The UI checks that the uploaded JSON is a list or dictionary, ensures there are article objects containing a 'page_html' field, and displays errors or warnings and stops execution if those conditions are not met or if the base URL is not provided.

Enjoyed this article?

Share it with your network to help others discover it

Last Week in Plain English

Stay updated with the latest news in the world of AI, tech, business, and startups.

Interested in Promoting Your Content?

Reach our engaged developer audience and grow your brand.

Help us expand the developer universe!

This is your chance to be part of an amazing community built by developers, for developers.