Photo by cottonbro studio: https://www.pexels.com/photo/bionic-hand-and-human-hand-finger-pointing-6153354/
In five years, AI has gone from a niche research topic to the center of every political debate, startup pitch, board meeting, and dinner-table argument. But if you zoom out, what interests me isn’t just the technology, but the language we’ve used to describe it.
Raise your hand if you’ve ever debated on whether AI-generated art was real art, about how ‘AI slop’ is ruining the internet, whether ‘AGI’ is around the corner, or if ‘AI is taking our jobs’.
These arguments — using these specific linguistics — didn’t appear out of nowhere. They were seeded, amplified, and normalized in us through thousands of headlines. I wanted to understand that process. Not through vibes, but data. So I collected 10k news articles about AI from 2020–2025 to find out.
The artificial intelligence discourse is at an all-time high.
My goal was to chart how we talk about AI, and what that reveals about our shifting hopes, anxieties, and assumptions. I wanted to quantify when certain narratives took over, when others faded, and how the entire conversation reoriented itself around inflection points like ChatGPT, Stable Diffusion, Agentic AI, and so on. It’s an attempt to quantify something we all feel but rarely measure: that the way we talk about a thing (here, AI) is one of the strongest signals for how society absorbs change brought about by that thing.
As always, to prevent this being a boring read, I’ll present my findings up front, then cover Methodology. If you’d like to read at your own pace, here’s the Table of Contents. Enjoy!
Table of Contents
- AI Adoption…and AI Slop
- Ethics, Fairness & Accountability
- Regulation & Governance
- The Generative AI Boom
- AGI & Superintelligence
- Jobs & Layoff Anxiety
- Explainers & Educational Content
- Machine Learning
- Society & Emerging Tech Risks
3. The Story of AI, As Told Through Data (2020–2025)
4. Some Interesting Observations
5. Methodology
6. Things Look Bleak, Even as the Technology Shines.
Setting The Table — General Sentiment Over Time
Before looking at the kinds of discourse that dominate the AI conversation, it helps to zoom out and ask a simpler question to get everyone on the same page.
How did the overall tone of AI news change over the last five years?
Calculating, then averaging sentiment across all 10,000 stories (check the Methodology section for how I did this and what I used) shows a clear arc.
General sentiment of AI media coverage, over time, with major events labeled, is telling.
We’ll cover these events and eras soon, but for now, we can see that from 2020 through mid-2022 coverage was steadily positive — sentiment hovered around 0.15 to 0.22 — reflecting an era where AI was mostly framed through research breakthroughs, corporate innovation, and long-tail applications that felt abstract or future-oriented.
That changes sharply after late 2022.
As ChatGPT enters the public sphere and AI becomes a mass-market technology, the tone of coverage drops. By 2023, sentiment falls to the 0.04–0.10 range and stays there.
The variance spikes too, meaning the stories become more polarizing: alongside excitement, you start seeing concerns about misinformation, job automation, copyright, safety, regulation, and social disruption.
By 2024–2025, during the “agentic” era of AI where autonomous tools and agent frameworks dominate the narrative, sentiment dips even further — bottoming out near zero in several quarters.
Once AI went from being the brand new, shiny technology to common infrastructure, the emotional tone changed from excitement to merely a kind of fatigue — and the clusters you’ll see next will explain why.
The 10 Types of AI News
The explosion of job/layoff concerns (Cluster 6) right after the ChatGPT launch is particularly notable.
1. AI Adoption…and AI Slop (Cluster 0)
What Is the Next Level of AI Technology? (Bernard Marr, Q3 ‘21)
Beyond AlphaFold: AI excels at creating new proteins (Phys.org, Q3 ‘22)
Facebook Is Being Overrun With Stolen, AI-Generated Images That People Think Are Real (404Media, Q4 ‘23)
I’ll pay for YouTube to get an ‘AI slop’ toggle (PCWorld, Q3 ‘25)
This category covers broad AI adoption and its impacts — good, bad, and everything in between. It splits neatly into 4 subclusters: AI Misuse & Public Harms, Corporate AI Adoption, Macro Strategy & Trends, and AI in Medicine & Research.
At first glance, Cluster 0 looks like a standard “AI progress” bucket: enterprise rollouts, scientific breakthroughs, the next big wave of innovation. But the dataset’s most important pattern sits inside it. The AI Misuse & Public Harms subcluster — 330 of 396 stories — is essentially the story of “AI slop.”
What is “AI slop”? These stories focus on:
- low-quality AI-generated blog spam and SEO garbage flooding the web
- degraded search results, declining trust in online information
- misinformation and fabricated news
- fake authors, fake experts, fake product reviews
- platforms struggling to moderate auto-generated content
This is the largest coherent micro-theme in the entire Cluster 0 set — larger than corporate adoption, larger than macro strategy, larger than medical AI!
And it explodes in popularity starting 2023, corresponding to the period when generative AI became cheap, fast, and ubiquitous via ChatGPT and Claude, and continues all the way to the present day.
Subcluster distribution of C0 over time.
This is a clear linguistic shift! As GenAI tools became mainstream, the media conversation about progress became inseparable from concerns about online decay. Many 2023–2024 headlines have an implicit “the internet is getting worse” sentiment. This suggests that the first widely felt consequence (yes, consequence, not first impressions, mind you) of AI for everyday internet users was not magic, but…mediocrity.
From then, going on to 2025, the rise of “AI slop” as a marker becomes a cultural anchor point in this cluster — a mass realization that generative AI was making the internet feel more synthetic, less trustworthy, and less human.
2. AI Ethics, Fairness & Accountability (Cluster 1)
Protecting the Human: Ethics in AI (Forbes, Q2 ‘21)
The Ethics Of AI: Navigating Bias, Manipulation And Beyond (Forbes, Q2 ‘23)
Creating AI That Helps, Not Harms (UC San Diego Today, Q1 ‘24)
Discussion of ethical considerations surrounding AI, including transparency, bias, fairness, responsible use, and frameworks for trustworthy AI.
This cluster functions as the moral vocabulary of the AI discourse, and its tone remains stable across all quarters. Nearly all headlines use similar language (“ethics,” “responsible,” “trust,” “bias,” “transparency”), and the reporting style is prescriptive — how organizations should behave, how to avoid harm, how to design for fairness. Most pieces come from universities and think tanks.
3. AI Regulation & Governance (Cluster 2)
Is It Time to Regulate AI? (Wall Street Journal, Q2 ‘22)
An Overview of Global AI Regulation and What’s Next (ProgressivePolicy, Q1 ‘23)
Mapping the Future: State-Level AI Regulation in the US (JD Supra, Q1 ‘24)
AI Regulation Is Coming. Fortune 500 Companies Are Bracing for Impact (Wall Street Journal, Q3 ‘24)
National and international efforts to regulate AI through legal frameworks, policy debate, government oversight, and global governance initiatives.
This cluster is all about the institutionalization of AI as a policy question. Unlike Cluster 1 (ethics), this cluster explicitly involves power structures — governments, legislative action, compliance regimes, jurisdictional competition.
4. The Generative AI Boom (Cluster 3)
A Coming-Out Party for Generative A.I., Silicon Valley’s New Craze (NYTimes, Q4 ‘22)
AI-powered marketing and sales reach new heights with generative AI (McKinsey, Q2 ‘23)
New tool makes generative AI models more likely to create breakthrough materials (MIT, Q3 ‘25)
This cluster marks the mainstream arrival of AI into everyday work. Coverage that explicitly focuses on generative AI, use cases, productivity impact, and its adoption across industries.
It is also extremely commercial — full of productivity claims, workforce transformation, and “how to adopt” style coverage.
5. AGI & Superintelligence (Cluster 4)
How close are we to AI that surpasses human intelligence? (Brookings, Q3 ‘23)
SoftBank CEO Son says artificial general intelligence will come within 10 years (Reuters, Q4 ‘23)
How close is AI to human-level intelligence? (Nature, Q4 ‘24)
What the Next Frontier of AI — AGI — Means for Government (ExecutiveBiz, Q1 ‘25)
Stories focused on artificial general intelligence, timelines, definitions, existential risk, and public debate over AGI’s meaning. An increasingly popular topic to speculate on — with the best growth rate in the dataset of ~4400% since Q1 2020.
This cluster is rhetorically extreme — alternating between unchecked hype (“AGI is inevitable”) and skepticism (“AGI is not meaningful”). I think we can call this mainly a philosophical cluster, and it often reads more like opinion essays than reporting. That’s expected — it’s reflecting the cultural imagination around AI, not its capabilities.
6. Safety, Risk & Alignment Infrastructure (Cluster 5)
Making AI Safety a Priority (McCormick, Q2 ‘22)
Why the U.S. Launched an International Network of AI Safety Institutes (Time, Q4 ‘24)
As AI models start exhibiting bad behavior, it’s time to think harder about AI safety (FastCompany, Q2 ‘25)
This cluster is all about institutional safety, not cultural fear. Think national safety institutes and standards bodies.
Stories here are fairly boring, to be perfectly honest, and go hand in hand with the AGI discourse. If Cluster 4 is about “What if AI becomes superintelligent?”, Cluster 5 is about “What are governments and labs actually doing about AI risks right now?”.
7. AI, Jobs…and Layoffs (Cluster 6)
Is AI Coming for Your Job? (Reuters, Q3 ‘23)
Elon Musk says AI will take all our jobs (CNN, Q2 ‘24)
AI is already taking jobs away from entry-level workers (Axios, Q3 ‘25)
In recent layoffs, AI’s role may be bigger than companies are letting on (CNBC, Q3 ‘25)
These are all articles examining how AI reshapes the labor market — job losses, creation, automation fears, and economic disruption.
This jobs-and-automation cluster is the single most emotionally volatile segment of AI media coverage, with a net negative sentiment (–0.12) and nearly one-third of headlines expressing explicit fear or loss. It tracks, almost perfectly, alongside economic downturns, tech layoffs, and moments where AI suddenly feels more capable than public expectations anticipated.
Here’s a numbers breakdown:
- Mean sentiment: –0.1188 (net negative)
- Variance: 0.2193 (high emotional spread)
- Neutral Stories: 555
- Positive Stories: 154
- Negative Stories: 344
A variance of ~0.22 means headlines swing sharply between panic (“AI is taking your job”) and optimism (“AI creates new roles”). 34% of stories explicitly express fear, loss, displacement, and economic anxiety. These include:
- headlines blaming AI for layoffs
- predictions of “end of work”
- commentaries about job replacement or worker precarity
Yes, positive stories exist (15% — these are “AI augments work” / “new opportunities” / “future of jobs” stories), but they are the minority.
Overall, C6 is the dataset’s economic anxiety dashboard. It is timing-sensitive, sentiment-sensitive, and grows sharply in the post-ChatGPT era (it has grown a whopping ~2300% since Q1 2020). As models become more capable, these stories become less hypothetical and more grounded in real workforce shifts.
8. Explainers & Educational Content (Cluster 7)
What is artificial intelligence and how is it used? (Europa, Q3 ‘20)
What is ChatGPT, DALL-E, and generative AI? (McKinsey, Q2 ‘24)
This cluster effectively tracks AI entering public consciousness. High-level explainers and educational articles introducing what AI is, how it works, and its societal role.
This cluster is exactly what you’d expect. Before 2023, these are basic primers. After 2023, they become cultural explainers about identity, creativity, misinformation. This is notable for being one of the few clusters that have actually shrunk since Q1 2020 (the beginning of the dataset), a negative growth of -41%.
9. Machine Learning Research & Technical Applications (Cluster 8)
Science Made Simple: What Is Machine Learning? (SciTech, Q4 ‘21)
Investigating Disparities in Machine Learning Algorithms (Feinberg School of Medicine, Q4 ‘22)
Machine learning models for polymeric long-acting injectables (Nature, Q1 ‘23)
Machine learning-based marker for coronary artery disease (The Lancet, Q4 ‘22)
This is the “STEM cluster”, explicitly for ML — full of academic and scientific framing. What distinguishes it from Cluster 7 is that it assumes technical literacy.
This cluster tapers off as generative AI dominates the mainstream conversation over raw ML — dropping from ~15% of coverage pre-ChatGPT to under 9% by 2024, where they’ve flatlined since (overall ~5% growth since Q1 2020.)
ML research coverage gets drowned out by GenAI hype, then stabilizes as a technical/programming niche. Whenever there’s a major capability jump (multimodal, reasoning), the “serious ML research” coverage ticks up slightly — people want to know how it works — then settles back down.
Also, a huge share of these stories are medical papers and findings, which gives us an interesting pattern: healthcare is an area where applied ML stays relevant.
10. Society & Emerging Tech Risks (Cluster 9)
Preparing for the Age of Deepfakes and Disinformation (Stanford, Q4 ‘20)
How to Enhance Public Safety With Emerging Technologies (Security Today, Q2 ‘20)
Computers rule our lives. Where will they take us next? (Science News, Q1 ‘22)
Tech layoffs ravage the teams that fight online misinformation (CNBC, Q2 ‘23)
Coverage spanning deepfakes, misinformation, cyber threats, algorithmic harms, tech scandals, and hybrid socio-technical risks.
This cluster is hybrid risk — neither existential (Cluster 4) nor institutional (Cluster 5). Its focus is the messy real world, and is one of the most practical and grounded risk clusters. It drops from 19–22% in 2020 to under 3% by 2025 — as most major AI risk concerns become more of an annoyance than a risk, and move to Cluster 0. That explains why this is the other cluster that has seen negative growth (-74%) since Q1 2020.
We’ve covered what the clusters are, so let’s map them out over five years, quarter by quarter, and see what stories emerge.
The Story of AI, As Told Through Data (2020–2025)
The stacked-area timeline shows that despite how chaotic AI discourse might feel up close, coverage tends to predictably shift around major model releases or public controversies, creating clear, distinguishable eras.
Here’s how the main clusters behave across the five-year window.
The Pre-ChatGPT Era (Q1 2020–Q3 2022): The Calm Before the Storm
- In early 2020, AI coverage was dominated by three things: general industry trend pieces (C0, ~21%), educational explainers (C7, ~19%), and emerging risks coverage (C9, ~19%) — mostly deepfakes, algorithmic bias, and misinformation concerns. This was still the era of “AI is coming someday” journalism.
- GPT-3’s release in Q2 2020 caused quite the ripple (though still not all the way to a wave yet.) Industry coverage bumped up to 27%, but the GenAI cluster barely moved (C3, 1.5%). The technology was there, impressive, but inaccessible — API-only, expensive, and developer-focused. The press covered it as a research milestone, not a consumer product.
- GitHub Copilot (Q3 2021) marked the first real “AI in your workflow” moment. But look at the data: no dramatic cluster shifts. Ethics coverage actually peaked higher that year (~15.7% in Q4 2021) than GenAI tools coverage. The narrative was still “should we do this?” not “everyone’s doing this.”
- DALL-E and Midjourney’s viral moment (Q2 2022) and Stable Diffusion going open source (Q3 2022) started warming up the GenAI cluster — it crept from 2.6% to 3.5% — but the real story here is the Explainers cluster holding steady at 17–19%. The public still needed to be told what AI was.
The ChatGPT Inflection Point (Q4 2022–Q2 2023): Everything Changes
November 30, 2022. ChatGPT launches. The data is unambiguous:
- GenAI Tools coverage jumps from 3.5% to 11.5% in a single quarter
- Jobs/Automation coverage — nearly invisible at 0.3% in Q3 — begins its climb
- General “AI Progress” coverage drops from 22% to 17.6% as the stories become specific
By Q1 2023, the transformation is complete. The Jobs cluster absolutely explodes from 0.9% to 14.1% — a 15x increase. “Will AI take my job?” becomes the dominant anxiety almost overnight. The GPT-4 launch and the infamous “pause letter” (signed by Musk, Wozniak, and others) kept the pot boiling.
The US AI Oversight Senate hearing in Q2 2023 (Sam Altman’s testimony) pushed Regulation coverage to its peak so far: 11%. For the first time, lawmakers were visibly scrambling to understand what they were dealing with.
The Open Source Pivot & Safety Awakening (Q3 2023–Q4 2023)
Llama 2’s open-source release (Q3 2023) is interesting in the data: GenAI Tools coverage hit its absolute peak at 13.6%. The narrative shifted from “OpenAI’s magic” to “anyone can run this.” But the real story was downstream.
Rishi Sunak’s Bletchley Park AI Safety Summit (Q4 2023) created the first major international coordination moment. The Safety & Risk Management cluster, which had been steady at 7–8%, goes up to 8.7% this quarter. More importantly, the AGI Speculation cluster — which had been bouncing around 3–5% — starts gaining traction as world leaders publicly discuss existential risk.
Meanwhile, the Explainers cluster begins its long decline — from 19% in early 2020 to 11% by Q4 2023. The public no longer needed to be told what AI was. They were living with it.
The Maturation Phase (2024): Regulation, Reality, and Routine
2024 is the year AI coverage normalized. Several threads converging here:
Q1 2024 saw Video/Multimodal AI (Sora’s preview, Gemini 1.5’s long context) push the boundaries again, but the cluster response was muted — GenAI Tools actually dropped slightly to 10.3%. I would say the novelty was wearing off.
The EU AI Act’s finalization (Q2 2024) kept Regulation steady at ~10%, establishing a new baseline for governance coverage. The Safety cluster also stabilized around 10% — no longer spiking around events, just… there.
The Jobs/Automation cluster becomes the new constant anxiety here, oscillating between 14–19% throughout 2024. This wasn’t event-driven anymore, but structural. Every quarter brought new layoff announcements, new automation studies, and those “AI won’t take your job but someone using AI will” think pieces.
NotebookLM’s podcast feature (Q3 2024) and Reasoning Models showing up for the first time (Q4 2024) — Claude 3.5 Sonnet, o1-preview — marked genuine capability jumps. This is also where the AGI Speculation cluster crept up to 7% as the “are we close?” discourse intensified.
The New Equilibrium (2025): Agents, Anxiety, and the Chinese Wave
By 2025, the data shows a new steady state emerging:
The Chinese LLM wave (Q1 2025) — DeepSeek, Qwen, and others achieving frontier performance at dramatically lower costs, and being completely open weight models — was huge, but it didn’t create a new cluster spike. It just reinforced existing ones: Safety concerns (countries worried about AI sovereignty), Industry coverage (the cost structure changing), and a slight uptick in Regulation discourse.
Educational/Explainer coverage has now dropped to ~6% by late 2025.
Finally, The Agentic AI Boom (Q3 2025) is the latest narrative. Jobs/Automation coverage has hit its all-time high at 19.6%. The conversation has shifted from “AI as tool” to “AI as worker” — and the labor market anxiety has crystallized, probably for good.
Takeaways
Beyond what’s covered, some patterns jump out at me here:
- Job anxiety is a constant. Since Q4 2022 (ChatGPT launch), C6 — the Jobs/Automation cluster has never dropped below 14%. It is the baseline anxiety of the entire 5 year period — always present, occasionally spiking, but never resolving.
- The Explainer Collapse. This coverage (C7) dropped from ~19% in 2020 to ~6% by Q4 2025. AI stopped being something to explain and became something to work with, or around. The most dramatic drop happened between Q4 2023 (11.1%) and Q1 2024 (4.5%) — right when video and multimodal AI went mainstream. Once AI could make videos and hold conversations, that “what is AI?” genre died. Everyone already knew.
- ML Research became the technical counterweight. C8 (ML Research) dropped post-ChatGPT (from 15% to 10%), but it never disappeared entirely. It stabilized around 8–9%, ticking up slightly during major capability jumps (multimodal AI, reasoning models). As the hype matured, a small but persistent technical-research audience remained — people who wanted to understand how it worked, not just what it could do. Not unlike many other technological leaps!
- AGI discourse is cyclical, not linear. C4 (AGI Speculation) doesn’t really grow, it only spikes around controversial moments. 5.6% in Q4 2020, drops to 1% in Q1 2021, spikes to 5.6% again in Q2 2023, peaks at 8.9% in Q4 2025. It’s the conversation we have when something feels like a capability threshold —or when Sam Altman decides to make bank hyping up “the singularity” 😅 — then we get bored and move on. Until the next one.5
- The Enshittification of the Internet. C9 — the “Emerging Risks” cluster (deepfakes, misinformation, algorithmic harms) dropped from 19–22% in 2020 to under 3% by 2025. But this doesn’t mean it simply normalized.
The subcluster analysis of Cluster 0 reveals where that conversation went. Within C0 — the “AI Progress & Industry” cluster — coverage of AI misuse, scams, and content quality degradation exploded from ~5% in early 2020 to 40% by late 2024. The inflection point is Q1 2023, immediately post-ChatGPT, when it tripled from 6.6% to 20.7% in a single quarter.
And then it gets worse: Llama 2 pushes it to 25.6%, it crosses 40% the quarter NotebookLM ships AI-generated podcasts, and the Chinese LLM wave — making frontier-quality generation nearly free — peaks it at 42%. Each democratization event correlates with another leg up in ‘AI slop’ coverage. The easier content is to generate, the more content we write about the mess it’s creating.
I think I can explain this. “Deepfakes” was an abstract threat in 2020. “AI slop” is something you now encounter every time you Google a recipe or read a product review in 2025. With multimodal AI models (and apps like Sora) offering text-to-image and text-to-video generation, the framing has shifted from “scary future technology” to “this is boring, this is exhausting, and this is actively degrading the internet right now.”
Same anxieties, different narrative container.
That’s all I have, thanks for reading! If you want to know about my methodology, read on.
Methodology
To map how AI discourse evolved, I needed a dataset that was broad enough to capture mainstream coverage and structured enough to analyze quantitatively. Here’s how I built it.
1. Collecting the Headlines (2020–2025)
Any SERP API worth their salt will let you scrape Google News. I simply used Bright Data’s SERP API, across eight core query themes:
- “artificial intelligence”
- “machine learning”
- “AGI”
- “AI ethics”
- “AI regulation”
- “AI safety”
- “AI concerns”
- “generative AI”
- “AI technology”
I used English-only + US-only results using the gl and hl params, and date ranges from Q1 2020 to present day using the tbs (to be searched) parameter. The key was setting the tbm (to be matched) parameter to nws to exclusively get news results, and not random blog posts.
// core fetch from Google News (with caching and custom date range)
// for each query + date range + page, do:
async function fetchSearchResults(searchQuery, startDate, endDate, page = 0) {
// check cache first
if (CONFIG.cacheEnabled) {
const cache = loadCache();
const cacheKey = getCacheKey(searchQuery, startDate, endDate, page);
const cached = cache[cacheKey];
if (isCacheValid(cached)) {
const resultCount = cached.data?.news?.length || 0;
return cached.data;
}
}
// fetch from API
try {
const proxyUrl = `http://brd-customer-${CONFIG.customerId}-zone-${CONFIG.zone}:${CONFIG.password}@${CONFIG.proxyHost}:${CONFIG.proxyPort}`;
const agent = new HttpsProxyAgent(proxyUrl, { rejectUnauthorized: false });
const start = page * CONFIG.resultsPerPage;
// Few things to note:
// 1. tbm=nws tells Google we want news results only (nws), not shopping (tbm=shop), videos (tbm=vid) or images (tbm=isch)
// 2. tbs=cdr:1 is the Google News date range parameter. CDR = 1 means we are going to provide a custom date range
// 3. cd_min is the minimum date
// 4. cd_max is the maximum date
// 5. gl=us is the Google News country parameter (here, United States)
// 6. hl=en is the Google News language parameter (here, English)
// 7. brd_json=1 tells the Bright Data SERP API we want the results as JSON
const dateRangeParam = `tbs=cdr:1,cd_min:${startDate},cd_max:${endDate}`;
const searchUrl = `https://www.google.com/search?q=${encodeURIComponent(searchQuery)}&start=${start}&brd_json=1&tbm=nws&${dateRangeParam}&gl=us&hl=en`;
const response = await fetch(searchUrl, {
method: 'GET',
agent,
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'Accept': 'application/json, text/html, */*'
}
});
const text = await response.text();
if (!response.ok) {
throw new Error(`HTTP error! Status: ${response.status} - ${response.statusText}`);
}
let data;
try {
data = JSON.parse(text);
} catch (parseError) {
if (text.trim().startsWith('<!DOCTYPE') || text.trim().startsWith('<html')) {
throw new Error('Received HTML instead of JSON - proxy may not be working correctly');
} else {
throw new Error('Response is not valid JSON');
}
}
// save to cache
const resultCount = data?.news?.length || 0;
if (CONFIG.cacheEnabled) {
const cache = loadCache();
const cacheKey = getCacheKey(searchQuery, startDate, endDate, page);
cache[cacheKey] = {
query: searchQuery,
startDate,
endDate,
page,
data,
timestamp: Date.now()
};
saveCache(cache);
console.log(` Page ${page + 1} fetched and cached (${resultCount} results)`);
} else {
console.log(` Page ${page + 1} fetched (${resultCount} results)`);
}
return data;
} catch (err) {
console.error(`Search request failed:`, err.message);
throw err;
}
}
SERP APIs need credentials (this one needs a customerId, zone, and password) to use, sign up here to get them.
After deduplication with URL matching + near-similar titles (fuzzy + Levenshtein Distance) this produced ~9,800 unique headlines, each with this schema:
- headline text
- date
- source
- URL
- query term used
- quarter
- description snippet
The dataset reflects how media frames AI. It is not the absolute ground truth of public opinion, but is still a fairly reliable proxy for the cultural weather. After all, news headlines are often the lens through which people understand new, rapidly moving tech.
2. Embedding the Text (Capturing Meaning)
To analyze themes rather than surface-level keywords, I used the local bge-m3 model (using Transformers.js, and the Xenova/bge-m3 ONNX) via the HuggingFace feature-extraction pipeline.
GitHub - huggingface/transformers.js: State-of-the-art Machine Learning for the web.
For each result obtained from SERP, I generated embeddings using mean pooling. I’m aware of the CLS vs. mean pooling debate (and used CLS — the bge-m3 family default — in my last project), but I think mean pooling aligns better with my goals here: I need the overall semantic gist of a headline rather than its lead tokens.
{
"headline": "SuperX Announces Establishment of U.S. Subsidiary to Accelerate Global AI Strategy and Deepen Silicon Valley Collaboration",
"date": "Sep 24, 2025",
"source": "prnewswire.com",
"url": "https://www.prnewswire.com/news-releases/superx-announces-establishment-of-us-subsidiary-to-accelerate-global-ai-strategy-and-deepen-silicon-valley-collaboration-302565696.html",
"query": ""AI" "technology"",
"quarter": "Q3 2025",
"description": "PRNewswire/ - Super X AI Technology Limited (NASDAQ: SUPX) ("the Company" or "SuperX") today announced that on September 18, 2025, it established a…",
"embedding": [
-0.032129839062690735,
-0.023417705669999123,
-0.008769853971898556,
// 1021 others
]
}
This projected every headline into a shared vector space where semantic similarity becomes geometric distance, and added this 1024-dimension array to each story object.
3. Clustering the Headlines (Finding the Narratives)
With all embeddings computed, I ran k-means clustering for multiple values of k (6, 8, 10, 12), using the ml-kmeans library with k-means++ initialization.
After qualitative checks — looking at centroids + top 20 examples per cluster — k = 10 produced the cleanest, most interpretable groups of headlines that were internally similar in tone, structure, and meaning.
{
"headline": "SuperX Announces Establishment of U.S. Subsidiary to Accelerate Global AI Strategy and Deepen Silicon Valley Collaboration",
// everything from the previous step, plus:
"cluster": 0
}
That’s ten major narratives in AI discourse. Just as before, we’ll assign this cluster ID designation (ranging from 0 to 9) to each story object.
After that, to convert these raw clusters into human-readable archetypes, each cluster’s evidence bundle — those centroid examples + top 20 representative examples — was interpreted by an LLM (gpt-oss:20b).
4. Subclustering Key Narratives (Zooming In Where Needed)
Four of the ten clusters were too broad:
- C0 — Industry & AI progress
- C3 — Generative AI headlines
- C5 — AI explainers
- C7 — AI safety & alignment headlines
What kind of industry coverage stories in cluster 0? What type of GenAI products/use cases in cluster 3? For educational/explainer content, which topics were being covered in cluster 5?
So for each one, I performed a second round of clustering on just the items inside that cluster (k = 3 or 4 for these subclusters depending on size of the main cluster).
That made sure I didn’t miss coherent internal phases within those narratives — and this decision paid off big time. As you saw, the dominant narrative of “AI slop” emerges inside this cluster from 2023–2025.
5. Timeline Frequency & Overall Sentiment Analysis
I purposefully built my schema so each headline had a quarter label (e.g., Q3 2024), so I could now group counts by cluster x quarter and normalize each quarter to 100% to see the percentage makeup of each kind of news story (cluster) to that quarter.
To quantify the overall emotional tone of AI discourse, as a whole, I ran the cardiffnlp/twitter-roberta-base-sentiment-latest model (using the Xenova version for ONNX weights compatible with Transformers.js) over headline + description, then broke it down per quarter.
This is a RoBERTa based model, fine tuned for sentiment analysis using the TweetEval benchmark.
Finally, I charted everything, after aligning discourse shifts with major AI milestones like ChatGPT’s launch in Q4 2022, the US AI Oversight Senate hearings in Q2 2023 where Sam Altman testified, the release of watershed tech like Stable Diffusion, Multimodal AI, Reasoning models, etc.
That’s everything!
Things Look Bleak, Even as the Technology Shines.
This shape of AI news, collected and presented like this, tells us that no matter how you cut it, that breathless wonder of 2022–2023 just hasn’t mature into optimism. Instead, we now have a sort of…resignation, and exhaustion.
We’ve gone from “What even is this? This is magic!” to “it’s taking my job” to “the internet is drowning in slop it created” in five years flat.
It’s a bleak time, even as the technology grows more impressive each year. Reasoning models, multimodal AI, agents. But mainstream conversation and linguistics used around AI has now settled into a steady hum of annoyance, anxiety, punctuated by occasional spikes of panic. I can’t say I’m thrilled.
Thank you for reading! 🙌 This was the third in a series of data-driven deep dives I did this month — forensic teardowns of things that are interesting, or things that shouldn’t work but do. If you want to see what else I find buried in data, follow along.