With hundreds of job boards worldwide listing thousands of roles, manually browsing even one job board can be time-consuming, not to mention checking multiple sites. Many job board APIs are either outdated, rate-limited, or prohibitively expensive for individual job seekers. What if you had a central hub to track job listings and monitor hiring trends across your preferred job sites?
Some users try to solve this by trying to scrape job boards data but run into various limitations. Others rely on static datasets, only to lose valuable time, a critical factor in job applications.
In this article, you’ll retrieve over 15 million fresh job listings using Bright Data’s Web Scraper API and use that data to build a real-time job market tracker powered by AI.
Why build a real-time job market tracker?
A typical job posting on LinkedIn gets hundreds or thousands of applications within hours. This already shows the need for a real-time job market tracker that captures jobs and hiring trends in real-time.
However, retrieving this data by writing your own scraping scripts can be inefficient especially if you want to scrape large volumes of job data. Issues like IP bans, dynamic content loading, and anti-scraping mechanisms such as CAPTCHA make the process difficult and unreliable.
Scraping tools like Bright Data’s AI Web Scraper, lets you overcome these limitations via their job data APIs, providing structured, real-time job data from platforms like LinkedIn, Glassdoor, and Indeed.
Bright Data provides two methods to do this:
- Scraper API: Integrate the scraper into your codebase to automate data collection.
- No-code Scraper: Collect data directly through Bright Data’s platform without writing any code.
Either of these methods gives you access to fresh, reliable and structured job data. For this article, we’ll be using the Scraper API.
Let’s start building an AI-powered career tool.
System architecture overview
The system architecture, illustrated in the flowchart below, emphasizes ethical data collection and AI integration to power your real-time job market tracker, all without the need for a server.
Note: The entire pipeline can run locally on your computer
Setting up Bright Data for LinkedIn or Glassdoor job data scraping
This section focuses on LinkedIn, but the same steps apply to Glassdoor.
- Create a Bright Data account if you haven’t done so (a free trial is available).
- Go to the Web Scrapers page: Under “Web Scrapers Library” to access the Web Scrapers libraries they have.
3. Search for your target domain, such as LinkedIn or Glassdoor, and select it. Bright Data supports over 120 popular domains.
4. From the list of LinkedIn scrapers, select LinkedIn job listings information — discover by keyword. This scraper allows you to retrieve data without logging in to LinkedIn.
5. Choose the Scraper API option.
6. Click API Request Builder, then Add Input to specify your preferred locations and keywords. You can also configure fields such as time_range, job_type, and experience_level.
7. Copy the generated cURL request.
8. Sign up or log in to Postman, then paste the cURL request as a POST request. The response will include a snapshot_id
9. In Bright Data, go to Management APIs and paste the snapshot_id. Copy the generated cURL code.
10. Paste it into Postman, then send it as a GET request. The response will include data such as company_name, job_title, and more, confirming a successful setup.
11. To download your data, click Logs, then Download, and choose your preferred delivery method.
12. Copy your authorization token and dataset_id as you’ll need these later in the tutorial.
13. Repeat the same process from step 2 to retrieve your Glassdoor credentials.
Note: You do not need to use Postman in your job tracker script. We only need it to retrieve the snapshot_id and to confirm a successful response. As you will see in the next steps, the job tracker script automates the entire process.
What’s inside the dataset?
The LinkedIn dataset includes key fields useful for job tracking, grouped as follows:
Job listings and applications
job_title— Identify trending job roles and compare demand across industries.job_posting_id,title_id— Track unique listings and role identifiers for deduplication and historical analysis.job_location— Analyze job availability in different regions.country_code— Segment data by geographic region or nationality requirements.job_summary,job_description_form— Extract keywords and skills for demand analysis.job_posted_time,job_posted_date— Understand posting recency and hiring velocity.job_num_applicants,application_availability— Measure role competitiveness and accessibility.apply_link— Provide direct access to job applications.URL— Include the source link for each job posting.
Company Insights
company_name,company_id— Research hiring patterns and company-specific trends.company_URL— Explore the employer’s brand and company information.
Job Structure and Career Pathing
job_seniority_level— Evaluate availability of entry-level and leadership roles.job_function— Group roles by functional domain, such as marketing or engineering.job_employment_type— Compare full-time, part-time, contract, and freelance roles.job_industries— Track demand within specific industry sectors.
Compensation Insights
job_base_pay_range— Understand salary ranges across roles and locations.
The Glassdoor dataset includes similar fields with additional insights such as salary, company reviews, and company ratings.
For this job scraping tutorial, we will use the following fields: job title, company, company URL, location, salary , skills and description.
Building a real-time job market tracker dashboard using AI, live job data, and Streamlit in Python
Step 1: Set up the environment
1.1 Create the project directory
mkdir job-market-tracker && cd job-market-tracker
1.2 Set up a virtual environment
python -m venv venv
Activate the environment:
On Windows:
venv\Scripts\activate
On macOS or Linux:
source venv/bin/activate
1.3 Install dependencies
pip install streamlit pandas altair langchain_ollama langchain_core.output_parsers langchain_
1.4 Define the project structure
job-market-tracker/
├── app.py
Step 2: Retrieve the LinkedIn and Glassdoor datasets
2.1 Set up Bright Data configuration
import streamlit as st
import requests
import time
import json
from collections import Counter
import pandas as pd
import altair as alt
from langchain_ollama import ChatOllama
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
BRIGHTDATA_HEADERS = {
"Authorization": "Bearer YOUR_API_TOKEN",
"Content-Type": "application/json"
}
2.2 Retrieve snapshots and datasets from each job board
def fetch_and_save_jobs_to_json():
platforms = {
"LinkedIn": {
"dataset_id": "gd_lpfll7v5hcqtkxl6l",
"payload": [
{"location": "EMEA", "keyword": "software engineer", "country": "FR", "time_range": "Past month", "job_type": "Full-time", "experience_level": "Entry level"},
{"location": "Israel", "keyword": "software developer", "time_range": "Past month", "job_type": "Full-time"}
]
},
"Glassdoor": {
"dataset_id": "gd_lpfbbndm1xnopbrcr0",
"payload": [
{"location": "New York", "keyword": "data analyst", "country": "US"},
{"location": "Paris", "keyword": "product manager", "country": "FR"}
]
}
}
merged_jobs = []
for platform, details in platforms.items():
st.info(f"Triggering {platform} snapshot...")
trigger_url = f"https://api.brightdata.com/datasets/v3/trigger?dataset_id={details['dataset_id']}&include_errors=true&type=discover_new&discover_by=keyword"
trigger_resp = requests.post(trigger_url, headers=BRIGHTDATA_HEADERS, json=details["payload"])
snapshot_id = trigger_resp.json().get("snapshot_id")
if not snapshot_id:
st.error(f"Failed to trigger snapshot for {platform}. Response: {trigger_resp.text}")
continue
st.info(f"Waiting 300 seconds for {platform} snapshot to complete (Snapshot ID: {snapshot_id})")
time.sleep(720)
snapshot_url = f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}?format=json"
snapshot_resp = requests.get(snapshot_url, headers={"Authorization": BRIGHTDATA_HEADERS["Authorization"]})
if not snapshot_resp.ok:
st.error(f"Failed to retrieve data for {platform}.")
continue
try:
json_data = snapshot_resp.json()
if isinstance(json_data, list):
jobs_data = json_data
elif isinstance(json_data, dict):
jobs_data = json_data.get("data") or json_data.get("results") or json_data.get("jobs") or []
else:
jobs_data = []
except Exception as e:
st.warning(f"Error parsing {platform} response: {e}")
continue
for job in jobs_data:
if not isinstance(job, dict):
continue
merged_jobs.append({
"title": job.get("title") or job.get("job_title") or None,
"company": job.get("company_name") or job.get("company") or None,
"location": job.get("job_location") or None,
"salary": job.get("salary") or job.get("estimated_salary") or job.get("pay_median_employer") or job.get("pay_median_glassdoor") or None,
"skills": job.get("skills") or job.get("description") or job.get("job_description") or job.get("job_summary") or None,
"company_url": job.get("company_url") or job.get("job_application_link") or None,
"source": platform
})
with open("merged_jobs.json", "w", encoding="utf-8") as f:
json.dump(merged_jobs, f, ensure_ascii=False, indent=2)
st.success("✅ Job data saved to merged_jobs.json")
2.3 Clean and structure the data
Clean the data, extract job listings’ relevant fields, classify entries, and normalize where necessary.
def normalize_jobs(data):
normalized = []
for job in data:
normalized.append({
"title": job.get("title"),
"company": job.get("company"),
"location": job.get("location"),
"salary": job.get("salary"),
"skills": job.get("skills") or "",
"company_url": job.get("company_url"),
"is_remote": "remote" in (job.get("location") or "").lower()
})
return normalized
def extract_skills(skill_texts):
skills_keywords = [
'Python', 'Java', 'JavaScript', 'TypeScript', 'React', 'Angular', 'Vue',
'AWS', 'Azure', 'GCP', 'Docker', 'Kubernetes', 'SQL', 'NoSQL',
'PostgreSQL', 'MySQL', 'MongoDB', 'FastAPI', 'Flask', 'Django',
'Node.js', 'Express', 'C#', 'C++', 'Rust', 'Go', 'Ruby',
'Spark', 'Hadoop', 'Pandas', 'NumPy', 'TensorFlow', 'PyTorch',
'Git', 'Jenkins', 'CI/CD', 'REST', 'GraphQL'
]
found_skills = set()
for text in skill_texts:
if not isinstance(text, str):
continue
text_lower = text.lower()
for skill in skills_keywords:
if skill.lower() in text_lower:
found_skills.add(skill)
return list(found_skills)
@st.cache_data(show_spinner=False)
def load_and_process_data():
with open("merged_jobs.json", "r", encoding="utf-8") as f:
job_data = json.load(f)
jobs = normalize_jobs(job_data)
df = pd.DataFrame(jobs)
df["extracted_skills"] = df["skills"].apply(lambda x: extract_skills([x]))
return df
Note: If you are scraping data from a job board, it will take about 3 minutes to retrieve and process the data.
Step 3: Analyze job data with AI using the Ollama Phi3 model
The Ollama Phi3 model helps you analyze job market trends and generate concise insights. It can summarize jobs by:
- Most in-demand skills
- Remote versus in-office roles
- Salary ranges per role
- Skill demand analysis
- Top five job roles
Why we’re using Ollama is because it is free and easy to set up, runs locally without an internet connection and delivers fast and customizable responses
Install Ollama
Ollama provides a command-line tool to run large language models (LLMs) locally. You can download it here or install it on your command line based on your operating system:
Windows (PowerShell):
curl -LO https://ollama.com/download/OllamaSetup.exe
start OllamaSetup.exe
Linux (Curl):
curl -fsSL https://ollama.ai/install.sh | sh
macOS (Homebrew):
brew install ollama
Download the Phi3 model
ollama pull phi3
Run the model
ollama run phi3
💡 Note: Ensure the Ollama model is running locally before executing your code. Otherwise, the AI model will not be accessible.
Step 4: Define the Streamlit application
Defining your job tracker app involves these series of steps:
4.1 Configure the Streamlit page layout and load data
Configure the page layout in your Python script (app.py) and load data from the JSON files.
def main():
st.set_page_config(page_title="Job Market Tracker", layout="wide")
st.title("💼 Job Market Tracker Dashboard")
# Button to fetch fresh data
if st.sidebar.button("Fetch Latest Job Data"):
fetch_and_save_jobs_to_json()
# Load and process data
try:
df = load_and_process_data()
except FileNotFoundError:
st.warning("No job data found. Please click 'Fetch Latest Job Data' to retrieve data.")
return
all_skills = [skill for sublist in df["extracted_skills"] for skill in sublist]
most_common_skills = Counter(all_skills).most_common(10)
remote_count = df["is_remote"].sum()
in_office_count = len(df) - remote_count
salaries = df["salary"].dropna().tolist()
salary_range = f"{min(salaries)} - {max(salaries)}" if salaries else "Not available"
top_jobs = df.dropna(subset=["title", "company", "location"]).head(5).to_dict(orient="records")
4.2 Create the recommended jobs section
# --- Recommended Jobs Section ---
preferred_location = "New York, NY"
st.header(f"⭐ Recommended Jobs in {preferred_location}")
ny_jobs = df[df["location"] == preferred_location]
if not ny_jobs.empty:
def make_clickable(row):
if pd.notna(row['company_url']):
return f'<a href="{row["company_url"]}" target="_blank">{row["title"]}</a>'
else:
return row["title"]
ny_jobs = ny_jobs.copy()
ny_jobs['job_title_link'] = ny_jobs.apply(make_clickable, axis=1)
st.markdown(
ny_jobs[["job_title_link", "company", "salary"]]
.rename(columns={"job_title_link": "Job Title"})
.to_html(escape=False, index=False),
unsafe_allow_html=True
)
else:
st.info("No jobs found for your preferred location this week.")
4.3 Build the Job Tracker chart
# --- Line Tracker Chart ---
st.subheader(f"📈 Job Count Tracker: {preferred_location} vs Other Locations")
top_locations = df["location"].value_counts().head(10)
location_counts = top_locations.reset_index()
location_counts.columns = ["location", "count"]
location_counts["highlight"] = location_counts["location"].apply(
lambda x: "Preferred" if x == preferred_location else "Other"
)
chart = alt.Chart(location_counts).mark_line(point=True).encode(
x=alt.X("location:N", sort=None, title="Location"),
y=alt.Y("count:Q", title="Number of Jobs"),
color=alt.Color("highlight:N", legend=alt.Legend(title="Location Type")),
tooltip=["location", "count"]
).properties(width=700, height=350)
st.altair_chart(chart, use_container_width=True)
# Sidebar filters
st.sidebar.header("Filter Options")
unique_locations = df['location'].dropna().unique().tolist()
selected_locations = st.sidebar.multiselect("Select Locations:", options=unique_locations, default=unique_locations[:2])
if not df['salary'].isnull().all():
min_salary = int(df['salary'].min())
max_salary = int(df['salary'].max())
salary_range_slider = st.sidebar.slider("Salary Range:", min_value=min_salary, max_value=max_salary, value=(min_salary, max_salary))
else:
salary_range_slider = (0, 0)
unique_skills = list(set(all_skills))
selected_skills = st.sidebar.multiselect("Filter by Skills:", options=unique_skills, default=unique_skills[:3])
4.4 Apply data filters
# Apply filters
filtered_df = df.copy()
if selected_locations:
filtered_df = filtered_df[filtered_df["location"].isin(selected_locations)]
if salary_range_slider != (0, 0):
filtered_df = filtered_df[(filtered_df["salary"] >= salary_range_slider[0]) & (filtered_df["salary"] <= salary_range_slider[1])]
if selected_skills:
filtered_df = filtered_df[filtered_df["extracted_skills"].apply(lambda skills: any(s in skills for s in selected_skills))]
4.5 Add tabs for Overview, Listings, and AI Insights
# Apply filters
filtered_df = df.copy()
if selected_locations:
filtered_df = filtered_df[filtered_df["location"].isin(selected_locations)]
if salary_range_slider != (0, 0):
filtered_df = filtered_df[(filtered_df["salary"] >= salary_range_slider[0]) & (filtered_df["salary"] <= salary_range_slider[1])]
if selected_skills:
filtered_df = filtered_df[filtered_df["extracted_skills"].apply(lambda skills: any(s in skills for s in selected_skills))]
# Tabs for overview, listings, AI insights
tab1, tab2, tab3 = st.tabs(["📊 Overview", "🔍 Job Listings", "🤖 AI Insights"])
with tab1:
col1, col2, col3 = st.columns(3)
col1.metric("Total Jobs", len(filtered_df))
if 'salary' in filtered_df.columns and not filtered_df['salary'].isnull().all():
avg_salary = f"${filtered_df['salary'].mean():,.0f}"
else:
avg_salary = "N/A"
col2.metric("Average Salary", avg_salary)
if 'is_remote' in filtered_df.columns and not filtered_df.empty:
remote_sum = filtered_df['is_remote'].sum()
remote_pct = filtered_df['is_remote'].mean() * 100
col3.metric("Remote Jobs", f"{remote_sum} ({remote_pct:.1f}%)")
else:
col3.metric("Remote Jobs", "N/A")
st.subheader("Top Locations")
if 'location' in filtered_df.columns and not filtered_df.empty:
st.bar_chart(filtered_df['location'].value_counts().head(10))
else:
st.write("No location data available.")
st.subheader("Skill Demand")
if 'extracted_skills' in filtered_df.columns and not filtered_df.empty:
skill_counts = Counter([skill for sublist in filtered_df["extracted_skills"] for skill in sublist])
top_skills = skill_counts.most_common(15)
if top_skills:
skill_names, skill_vals = zip(*top_skills)
st.bar_chart(dict(zip(skill_names, skill_vals)))
else:
st.write("No skill data available.")
else:
st.write("No skill data available.")
with tab2:
st.subheader("Detailed Job Listings")
def make_clickable(row):
if pd.notna(row['company_url']):
return f'<a href="{row["company_url"]}" target="_blank">{row["title"]}</a>'
else:
return row["title"]
st.write("filtered_df shape:", filtered_df.shape)
st.write("filtered_df columns:", filtered_df.columns.tolist())
required_cols = ["company_url", "title", "company", "location", "salary"]
if not filtered_df.empty and all(col in filtered_df.columns for col in required_cols):
filtered_df = filtered_df.copy()
filtered_df['job_title_link'] = filtered_df.apply(make_clickable, axis=1)
st.markdown(
filtered_df[["job_title_link", "company", "location", "salary"]]
.rename(columns={"job_title_link": "Job Title"})
.to_html(escape=False, index=False),
unsafe_allow_html=True
)
else:
st.info("No job listings to display after filtering.")
4.6 Initialize Ollama for AI insights
Configure your Job Tracker app to connect with the Ollama model.
with tab3:
st.subheader("AI-Powered Market Analysis")
with st.spinner("Generating insights with Phi-3..."):
prompt = ChatPromptTemplate.from_template("""
Analyze the job market trends from this data:
**Top 5 Jobs**: {top_jobs}
**Most In-Demand Skills**: {skills}
**Remote vs Office**: {remote} remote, {office} office
**Salary Range**: {salaries}
Generate a concise summary with:
1. Key hiring trends
2. Skill demand analysis
3. Remote work prevalence
4. Salary insights
Format in markdown with bullet points.
""")
llm = ChatOllama(model="phi3")
chain = prompt | llm | StrOutputParser()
response = chain.invoke({
"top_jobs": [job['title'] for job in top_jobs],
"skills": [skill for skill, _ in most_common_skills],
"remote": remote_count,
"office": in_office_count,
"salaries": salary_range
})
st.markdown(response)
if __name__ == "__main__":
main()
Step 5: Run the app
To launch the Streamlit app, run the following command:
streamlit run app.py
Your Job Market Tracker dashboard is now live, showing data visualization from job boards. Remove the side filters to see the total list of jobs.
You can click any job post to apply directly. If a job is visible in the app, it means the application is still open. This makes it easy to track active job listings in real-time and receive real-time job alerts.
Step 6: Automate data updates with a task scheduler
You can automate data retrieval by scheduling your scraping script to run at defined intervals. Windows and macOS both include built-in schedulers that can help you achieve this. For example, you could run the script every five minutes to ensure your app stays updated with the latest job data.
6.1 Create a script file for automation
Save the following PowerShell command as job_scraper.bat in your project directory:
├── app.py
├── merged_data.json
└── job_scraper.bat
For Windows:
@echo off
cd /d "C:\Users\NEW USER\Desktop\Work\BrightData Job Market Tracker\"
python app.py
echo Job scraper executed at %date% %time% >> job_scraper.log
For Mac:
#!/bin/bash
cd "/Users/NEW_USER/Desktop/Work/BrightData Job Market Tracker"
python3 app.py
echo "Job scraper executed at $(date)" >> job_scraper.loglo
6.2 Set up the task
- Open the Task Scheduler on your system and Click Create Task.
2. Provide a name and description for the task.
3. Under Actions, click New, then Browse, and select job_scraper.bat.
4. Under Triggers, click New and choose your preferred interval and start time.
5. Under Conditions, ensure power-related restrictions are unchecked.
6. Go to Settings, adjust if necessary, and click OK.
You’ve now scheduled your script to run automatically, keeping your Job Tracker app updated with live data.
Conclusion
Congratulations. You have successfully built a real-time job tracker that saves you the hassle of manually searching multiple job sites to extract relevant information.
By leveraging Bright Data’s AI-powered scraper, you can collect live job data from popular job listing platforms and use LLMs to summarize hiring trends. You can also build on this foundation by adding features like resume matching, interview question generation, or weekly job market summaries. Feel free to extend the tool based on your needs, for the possibilities are endless.