TL;DR — Search engines don’t just retrieve information; they decide what counts as “knowledge”. Depending on the engine, the same query can prioritize institutional safety, consensus, monetization, or unfiltered chaos. This investigation maps those differences.
Just how differently do the most popular search engines work?
To find out, I ran a search-engine forensics investigation using identical queries across Google, Bing, DuckDuckGo, and Yandex. Using Bright Data’s SERP infrastructure, I collected 1,630 SERP results spanning commercial, technical, YMYL (Your Money or Your Life) topics, and open-ended “wildcard” questions, and analyzed the titles and snippets users actually see. I treated each result as a data point and mapped the emergent “information signatures” of each platform — without scoring truth or intent, only structure, emphasis, and source selection.
What I found was far more than just minor ranking variations. I actually had distinct “personalities”, here. Even when user intent was identical, the answers were structurally different. It’s almost as if reality fragments at the search-engine layer — and the version of the world you encounter depends on which engine you trust.
1. Domain Diversity vs Domain Authority
How Many Voices Does Each Engine Allow?
The first signal is simple: how many distinct domains appear per query. On average, Yandex surfaced the most unique domains per query (~11.7), followed by DuckDuckGo (~9.4), Google (~8.8), and Bing (~6.2).
Press enter or click to view image in full size

Average unique domains per query, across all four search engines.
This isn’t a value judgment in and of itself, but it is revealing. Higher diversity suggests a broader sampling of sources; lower diversity implies tighter editorial consolidation.
Bing’s comparatively low diversity hints at a strong preference for repeat “trusted” sources (and we’ll soon see just how deep that particular rabbit hole goes.) Yandex and DuckDuckGo, by contrast, appear structurally more exploratory — willing to surface a wider range of publishers, even if that comes at the cost of consistency.
Who Gets to Speak, and When?
Breaking sources down by domain type (.gov, .edu, .org, .com) reveals how engines encode authority differently — especially under YMYL conditions.
In commercial queries, all engines overwhelmingly favored .com domains, with minimal variation.
Press enter or click to view image in full size
Press enter or click to view image in full size
While the split for commercial queries is as expected, YMYL content is where the differences really start showing. Google favors institutional sources overwhelmingly, while Yandex favors .com domains more.
But for YMYL queries, big differences start showing up.
- Google’s top results skews heavily toward government, academic, and nonprofit sources, with .gov/.edu/.org accounting for a majority of top-ranked content.
- Bing also leaned institutional, though less aggressively.
- DuckDuckGo and Yandex showed a noticeably higher tolerance for commercial and mixed-authority sources even in sensitive domains.
This suggests that “authoritativeness” is not a universal concept for search engines — it is defined + enforced differently depending on platform philosophy and risk posture.
For domain type distribution, the Technical and Wildcard categories weren’t really interesting — they’re exactly as you’d expect, and similar to Commercial. I’m skipping those.
2. The Personalities of Search Engines
By this point, it’s clear that search engines don’t just rank differently — they behave differently. Each one shows a consistent “retrieval” personality, so to speak. By that, I mean a set of preferences about authority, risk, diversity, and acceptable sources of knowledge. These “personalities” show up across categories, queries, and metrics. Let’s look into each search engine and see what kind of personality they showed.
Press enter or click to view image in full size

Most frequently appearing domains across all 4 search engines.
Google: The Institutional Gatekeeper. For Better or Worse.
For YMYL queries — health, finance, safety — this is where Google behaves the least like a neutral party/index — it has clearly taken on the responsibility of only showing the highest weighted domains, no matter what.
Press enter or click to view image in full size

A breakdown of Google’s top domains. There’s a healthy mix, if skewed towards institutional sources.
Across YMYL searches, 61% of Google’s results come from .gov, .edu, or .org domains, already a clear majority. But the strongest signal appears at the very top of the search results page. In the top three positions, institutional sources account for 73.33% of results, while commercial (.com) domains drop to just 16.67%.
Google doesn’t merely include institutional authority — it front-loads it, especially where perceived risk is highest.
Query: “is coffee bad for you” Google’s Top 3:Mayo Clinic Johns Hopkins Medicine UT Southwestern Medical Center
In the top ten, seven results come from major medical institutions, with community or experiential sources appearing only after some sort of authoritative consensus has been established.
Needless to say, this pattern is not accidental. It reflects a deliberate policy choice on Google’s part to privilege institutional authority under YMYL conditions. From Google’s POV, obviously this is pretty rational risk management. At global scale, health and finance queries are not abstract knowledge problems; they are legal, political, and ethical liabilities for Google. Frontloading institutional sources let them externalize that responsibility, stay in line with regulatory expectations, and minimize the chance of any catastrophic harm.
But this stance comes with trade-offs.
By systematically doing this, Google collapses disagreement into consensus and emergent knowledge into settled fact. Independent researchers, patient communities, and experience-driven perspectives are not necessarily excluded — but they are consistently positionally suppressed in these search results pages, especially in the moments where users are most likely to stop scrolling.
Over time, this creates a feedback loop. Institutional voices receive disproportionate visibility and legitimacy, alternative perspectives lose reach, and institutional consensus appears even more dominant. The system always, always validates its own assumptions.
Why Google’s Approach Might Not Always Be a Good Idea.
Consider that until very recently, established institutional authorities classified transgender identity as a psychiatric disorder. For example:
- The American Psychiatric Association included “Gender Identity Disorder” in earlier editions of the Diagnostic and Statistical Manual of Mental Disorders (DSM-III, DSM-IV).
- Even the World Health Organization (!!!) classified “transsexualism” as a mental disorder in the ICD until 2019, when it was moved out of the mental disorders chapter in ICD-11.
Sure, those classifications are now widely regarded as wrong or harmful, including by the very institutions that once promoted them. But the correction did not originate from top-down institutional consensus. Under Google’s YMYL framework, search results during that period would have overwhelmingly surfaced:
- Established medical institutions
- Official diagnostic manuals
- Hospital systems and government health agencies
Lived experiences from individuals, clinicians reporting mismatches between diagnosis and lived reality, early dissenting researchers — would have been algorithmically deprioritized by design.
Not censored. Just plain buried.
Anyway, according to Google, truth, in high-stakes domains, flows from recognized institutions downward. This approach is often defensible — and sometimes necessary — but it also narrows the epistemic frame.
Reality, according to Google, is what institutions say it is.
Bing: Has A Severe Monoculture Problem.
For most queries (especially technical and YMYL) Bing behaves like a consensus-finding machine instead of a search engine.
Press enter or click to view image in full size

Bing’s sources have worryingly low domain diversity.
Bing has the lowest domain diversity of any engine in the dataset, averaging just 6.18 unique domains per query. That concentration becomes absurdly, comically extreme in technical searches, where 43.82% of Bing’s results come from StackOverflow alone, as well as in YMYL queries where 31.73% of results link to Mayo Clinic, with a small set of authoritative sources repeatedly resurfacing. Commercial results, similarly, always redirect to official manufacturer pages rather than any online store.
Bing is just selecting upstream arbiters and then amplifying their voices.
Where Google internally uses “Which institutions can absorb liability for this?” before answering, Bing internally considers something like:
“Which sources already function as de facto answer-oracles for this topic?”
In technical domains, that oracle is Stack Overflow. In health, it’s Mayo Clinic. In ecosystems it overlaps with, it’s Microsoft documentation — and, interestingly, even Google’s own docs when those have become canonical.
Once a source has crossed some internal threshold of:
- recognizability
- historical correctness
- low controversy
- repeat citation elsewhere
…Bing appears to treat it as an answer sink. Queries collapse into references to the same few nodes.
Like Google, this stance is understandable, if nothing else. Bing optimizes for reliability and predictability. Consensus sources are less likely to be wrong in obvious ways, less likely to trigger controversy, and easier to defend as “reasonable” choices, et cetera, et cetera. But unlike Google, which draws its authority boundary around established institutions, Bing draws it around delegated arbiters — sources that have already been socially anointed as places where “the answer” lives.
The cost is monoculture.
When a single source accounts for nearly half of all results in a category, it becomes a single point of epistemic failure. Take that StackOverflow result for example. StackOverflow is invaluable — but it is also gameable, shaped by moderator norms/community culture (read: policing), and biased toward certain voices and problem framings.
Its answers reflect StackOverflow’s community power structures just as much as technical correctness. Bing’s heavy reliance on it turns those biases into infrastructure.
Bing’s worldview is conservative and convergent. According to Bing, truth is what most experts already agree on. It is a safe engine — but again, one that rarely surprises, and rarely surfaces the odd, insightful edge case.
DuckDuckGo: Privacy Focused, But Aggregator Heavy.
DuckDuckGo presents itself as the anti-Google: private, independent, and user-first. Its information signature though, tells a more nuanced story.
Press enter or click to view image in full size

DuckDuckGo’s sources, for the most part, mirror Google’s — except they have an aggregator spam problem.
In YMYL queries, ~63% of DuckDuckGo’s results come from .gov, .edu, or .org domains — lower than Google’s ~70%, but still a strong institutional majority. In the top three positions, institutional sources remain dominant, though with slightly more room for commercial and mixed-authority content than Google allows.
Where DuckDuckGo diverges is how it (frequently) tolerates aggregators. In YMYL finance queries for example, SmartAsset appears 13 times — more than any domain on any other engine. Similar patterns appear with comparison and lead-generation sites across categories. Bing loves listicles and aggregator spam.
Take this single YMYL finance query. It’s a super boring YMYL question: vague, high-stakes, and normative. There is no single correct answer — only frameworks, heuristics, and trade-offs. That ambiguity makes it a useful probe.
“best investment strategy for retirement”
On Google, this query immediately collapses into institutional authority. The top results are dominated by Fidelity, Vanguard, Merrill Lynch, the U.S. Department of Labor — large, regulated entities offering compliance-safe frameworks rather than any advice. The results page reads less like an answer to a question and more like some sort of syllabus on investment. 😅
{ "title": "Investing in Retirement: 5 Tips for Managing Your Portfolio | Merrill Lynch", "source": "ml.com", "description": "Merrill Lynch highlights diversification as a key strategy for retirement, noting that combining stocks and bonds helps balance long-term growth and risk protection across market conditions.", "search_string": "best investment strategy for retirement", "search_engine": "google" },
{ "title": "Building Retirement Income Strategies | Fidelity", "source": "fidelity.com", "description": "Fidelity outlines four retirement income strategy models: interest and dividends only, investment portfolio only, portfolio plus guarantees, and short-term strategies, offering retirees flexibility based on risk tolerance and income needs.", "search_string": "best investment strategy for retirement", "search_engine": "google" }, { "title": "Guide to saving for retirement - Vanguard", "source": "investor.vanguard.com", "description": "Vanguard provides a step-by-step retirement preparation plan including estimating expenses, selecting accounts, investing, maximizing contributions, and adjusting strategies over time to align with life goals.", "search_string": "best investment strategy for retirement", "search_engine": "google" }, { "title": "Put your savings in different types of investments | U.S. Department of Labor", "source": "dol.gov", "description": "The U.S. Department of Labor emphasizes diversification as a way to reduce risk and improve returns by spreading investments across different asset types, such as stocks, bonds, and real estate.", "search_string": "best investment strategy for retirement", "search_engine": "google" }, { "title": "Learn how to secure your future with the best retirement investments | Nuveen", "source": "nuveen.com", "description": "Nuveen offers guidance on selecting retirement strategies through a variety of accounts and funds, helping individuals find a tailored investment approach that fits their financial goals and risk profile.", "search_string": "best investment strategy for retirement", "search_engine": "google" }
Run the same query on DuckDuckGo, and now instead of institutions, we are saturated with aggregators and listicles. SmartAsset dominates the Top 5, alongside debt relief sites, comparison blogs, and SEO-optimized “top strategies” roundups. These pages are ultra-commercial, prescriptive, and aggressively optimized for conversion.
{ "title": "10 Retirement Strategies You Need to Know", "source": "smartasset.com", "description": "From tax-advantaged accounts to annuities, there are several retirement strategies to consider when planning. Learn more here.", "search_string": "best investment strategy for retirement", "search_engine": "duckduckgo" }, { "title": "Smart Retirement Investments: Strategies to Consider in 2025", "source": "nationaldebtrelief.com", "description": "Discover the best investment strategies for retirement in 2025, including top retirement investment ideas and tips to secure your financial future.", "search_string": "best investment strategy for retirement", "search_engine": "duckduckgo" }, { "title": "Top 11 Retirement Strategies", "source": "smartasset.com", "description": "Learn about retirement strategies including tax-advantaged accounts, annuities, and more to help plan for a secure future.", "search_string": "best investment strategy for retirement", "search_engine": "duckduckgo" }, { "title": "10 Retirement Strategies You Need to Know", "source": "smartasset.com", "description": "From tax-advantaged accounts to annuities, there are several retirement strategies to consider when planning. Learn more here.", "search_string": "best investment strategy for retirement", "search_engine": "duckduckgo" }, { "title": "10 Retirement Strategies You Need to Know", "source": "smartasset.com", "description": "From tax-advantaged accounts to annuities, there are several retirement strategies to consider when planning. Learn more here.", "search_string": "best investment strategy for retirement", "search_engine": "duckduckgo" },
Yes, all those Smartasset dot com results are different articles!
This is a paradox and a half, to be honest. 😅 An engine designed to minimize tracking exposure ends up routing users through heavily commercialized middlemen. The likely explanation is probably that they rely a lot on upstream indices and ranking signals, and just…don’t have the budget/infra to throw at a Google-tier editorial suppression.
Ultimately, DuckDuckGo values privacy and avoids heavy-handed intervention — but that restraint allows monetization to leak into sensitive queries.
Yandex: “Chaotic Neutral” in Search Engine Form.
If Google curates for institutional authority and Bing for the safe picks, Yandex curates for maximum exposure. At whatever cost.
Press enter or click to view image in full size

Yandex surfaces some truly out-there domains, dangerously so.
Yandex surfaces the highest domain diversity of all engines, averaging 11.69 unique domains per query, and includes 164 domains that appear on no other platform. It is the most pluralistic — and the least filtered — search engine in the dataset.
Consider YMYL-adjacent queries like:
- “401k withdrawal rules”
- “best running shoes”
- “top rated wireless earbuds”
In each case, Yandex surfaces results hosted on opaque CloudFront subdomains — long, random-looking URLs that clearly do not represent publishers, brands, or even stable sites. These pages mimic legitimate listicles (“Tested & Rated,” “Best of 2025”) but provide no clear authorship, organization, or accountability. Like I found a CloudFront-hosted page that effectively impersonated SmartAsset content without being SmartAsset at all.
[
{
"title": "401(k) Tax Rules: Withdrawals, Deductions & More",
"source": "dr5dymrsxhdzh.cloudfront.net",
"description": "Unlike traditional 401(k) plans, Roth 401(k) accounts are funded with post-tax contributions, which means withdrawals can be taken tax-free if certain conditions are met… SmartAsset: 401(k) Tax Rules on Withdrawals, Deductions & More.",
"search_string": "401k withdrawal rules",
"search_engine": "yandex"
},
{
"title": "10 Best Running Shoes of 2025 | Tested & Rated",
"source": "d1nymbkeomeoqg.cloudfront.net",
"description": "A great pair of running shoes brings with it the promise of a new day, a fresh run, and a better you, no matter what is happening in the world at large.",
"search_string": "best running shoes",
"search_engine": "yandex"
},
{
"title": "Best Wireless Earbuds of 2025 | Tested & Rated",
"source": "djd1xqjx2kdnv.cloudfront.net",
"description": "Wireless earbuds-the ear tip seal on the vibe beam is good, allowing a more immersive…",
"search_string": "top rated wireless earbuds",
"search_engine": "yandex"
}
]
Google and Bing suppress this class of result almost entirely. Yandex does not.
The same permissiveness appears throughout its YMYL handling. In health queries, Medium posts, lifestyle blogs, YouTube videos, and institutional medical sources routinely show alongside trusted authorities. In finance, LinkedIn posts from entirely random users appear alongside Forbes and Investopedia. In commerce, scammers pretending to be a totally different website, affiliate spam, thin comparison pages, and reputable reviews coexist with minimal differentiation.
This is not accidental. It’s a totally coherent — if extreme — philosophy.
Yandex treats the web as inherently chaotic and contested. It does not even try to decide which sources are legitimate, which voices are authoritative, or which domains are accountable. It’s designed to maximize exposure and variety, and leave evaluation entirely to the user. Authority is just… optional.
The upside of this approach is real. Yandex is exceptionally good at surfacing:
- new emerging perspectives/early-stage knowledge
- unofficial but insightful explanations
- non-canonical voices (not always a good thing)
The downside is equally real, and…I’m not sure the tradeoff is worth it? By refusing to draw hard boundaries around authorship, institutional responsibility, or even basic site legitimacy, Yandex allows misinformation, impersonation, scams, and noise to blend seamlessly into the same epistemic surface as genuine expertise.
This is chaotic neutral in its purest form. 😅 Yandex does not protect the user from bad information, nor does it meaningfully privilege good information. It assumes a high-competence, high-skepticism user — and silently punishes anyone who lacks those priors.
In a search ecosystem increasingly defined by guardrails and liability management, Yandex stands apart as the wild west by design. You almost have to admire it.
3. Search Engines Vary Wildly at Figuring Out What The User Meant
Ranking differences are easy to see. Intent inference differences are harder — because they happen before ranking begins.
The clearest way to surface them is to hold the query constant and observe how each engine silently answers a different question.
Query: “should I move abroad?”
This isn’t a factual lookup, but a decision under uncertainty.
- Google treats it as a decision-support problem, so you get cost-of-living comparisons, healthcare access, visa rules, and lived-experience discussions. Google assumes the user wants help weighing trade-offs.
- Bing interprets the same query as an informational lookup. It surfaces immigration portals, official statistics, and formal descriptions of process. Bing assumes the user wants to understand the rules, not make the choice.
- DuckDuckGo treats it as a research aggregation task, surfacing blog posts and “pros and cons” articles without strong framing. It assumes exploration without guidance.
- Yandex treats it as open discourse, surfacing personal narratives, YouTube explainers, and expat forums. It assumes the user wants to see the argument, not the answer.
Same query, but with four different inferred goals. The pattern repeats across very different intents.
For “how to learn Spanish”, Google assumes the user wants a structured plan, Bing assumes understanding precedes execution, DuckDuckGo assumes tool discovery, and Yandex assumes learning happens socially and visually.
For “is intermittent fasting safe”, Google frames risk conservatively, Bing collapses onto a narrow institutional consensus, DuckDuckGo allows mixed advice, and Yandex surfaces disagreement directly.
What this reveals
Intent inference is less about correctness, more about what kind of answer the system believes the user deserves.
Before ranking begins, each engine quietly decides:
- Is this a decision or a lookup?
- Is subjectivity acceptable?
- Should disagreement be surfaced or suppressed?
- Is safety more important than exploration?
Those choices differ by platform — and they shape the user’s reality far more than ranking tweaks ever could. Even if you could somehow optimize for search engine bias, deciding what the question is for is a decision made before ranking algorithms kick in that is equally important.
4. YouTube as a Source: Yay or Nay?
How a search engine treats YouTube is also a reliable signal of how it defines knowledge itself. Across identical queries, engines don’t just rank video differently — they disagree on whether video is a legitimate way of knowing, at all.
Press enter or click to view image in full size

Percentage of all 1630 search results containing youtube.com as a source, by search engine.
Take this query, for example.
Query: “react useEffect cleanup”
This is a technical + procedural question.
- Google surfaces YouTube sparingly and late, after official documentation and written explanations. Video is supplementary — useful for intuition, not authority.
- Bing shows a similar distribution to Google.
- DuckDuckGo didn’t have any YouTube videos in the results themselves.
- Yandex places YouTube front and center. Video tutorials routinely appear alongside — or ahead of — written documentation.
Same query, different epistemologies.
Query: “how to learn Spanish”
Now the task is experiential.
Google and Bing still emphasize structured programs and institutional guides, with video as an aid. DuckDuckGo shows nothing (they do contain a different Video field, though, but that’s not part of the search results themselves.) Yandex, by contrast, treats YouTube as the primary learning surface! Immersion videos, informal teachers, and community-driven instruction dominate.
Surprisingly, this continues to be a thing even in higher-risk queries.
Query: “is intermittent fasting safe”
Google and Bing absolutely suppress YouTube here in favor of institutional medical sources. Yandex though, continues to include YouTube prominently in the results, even alongside clinics and health authorities.
Again — a policy difference, not a ranking algorithm difference.
For Google/Bing, YouTube is treated as a didactic aid. But for Yandex, it’s apparently experiential knowledge.
Yandex seems to be making a call here on whether demonstration counts as explanation, whether personality and persuasion are acceptable components of understanding, and whether authority must be textual to be trusted.
Press enter or click to view image in full size

Breakdown of Youtube.com listed as a source, by category of search query.
For many modern users — developers, learners, non-native speakers — YouTube is how knowledge is acquired (“visual learning” etc.) Engines that demote video implicitly privilege certain learning styles and literacies. Engines that surface more YouTube videos accept higher risk in exchange for accessibility.
Once again, the difference is not about ranking quality. It’s about what kinds of knowing are allowed to count.
5. Consensus vs Fragmentation: Just How Shared Is the Web?
One of the most striking findings in this analysis is not how differently search engines rank results — but how little they agree on which sources matter at all.
Across every query category, the overlap between Google, Bing, DuckDuckGo, and Yandex is vanishingly small. Only ~2–3% of domains appear in results from all four engines, regardless of category. In practical terms, that means fewer than three out of every hundred sources are treated as universally relevant across the modern search ecosystem.
The rest of the web is fragmented.
Press enter or click to view image in full size

Depending on category, between 68.6% and 91.3% of domains appear on only one engine. Commercial: 87.68% unique, Technical: 81.05% unique, YMYL: 91.33% unique, Wildcard: 68.57% unique.
Measured by category, the percentage of domains shared across all four engines is:
- Commercial queries: 1.9%
- Technical queries: 1.96%
- YMYL queries: 2.89%
- Wildcard queries: 3.33%
Even in the best case — open-ended wildcard searches — over 96% of domains are not shared across engines. There is no category in which a meaningful consensus emerges.
YMYL queries — where one might expect the strongest agreement — are actually the most fragmented of all. While engines like Google and Bing enforce strong institutional filters, DuckDuckGo and Yandex allow a much broader — and sometimes riskier — set of sources. The result is not a shared reality with minor ranking differences, but fundamentally different epistemic environments.
This is just a core, structural property of modern search.
If there were such a thing as a canonical “top of the web,” we would expect to see convergence: a stable core of sources that every engine agrees are authoritative, especially in high-stakes domains. Instead, the data shows the opposite.
Each engine constructs its own sourcing universe, with only a thin sliver of shared ground. What qualifies as “authoritative,” “relevant,” or even “acceptable” varies dramatically depending on which engine you use.
Two users asking the same health or finance question on different engines are not being guided toward the same pool of knowledge. They are being pointed at different universes of sources.
💡 This fragmentation does not mean that search engines are broken or irrational. It just means they are optimizing for different objectives:- Legal defensibility- Risk tolerance
- Diversity vs safety
- Consensus vs exploration
- Regional and infrastructural constraintsEach objective prunes the web differently, and our overlap statistics show that these pruning strategies rarely agree.
That’s all the results. Thank you for reading! If you want to know about my Methodology, read on.
Methodology
All observations in this analysis are based on direct SERP retrieval, not third-party summaries or clickstream data.
Queries were issued programmatically to four search engines — Google, Bing, DuckDuckGo, and Yandex — using a Node script that fetches raw results pages via a SERP API (Bright Data here, the one I had access to. Docs here.) Visualizations were generated later, via D3.js.
SERP API - SERP Scraper API - Free Trial
D3 - The JavaScript library for bespoke data visualization
The script does not simulate user interaction, personalization, or logged-in sessions. Each engine is queried with the same plain-text search string.
const path = require('path');
require('dotenv').config({ path: path.join(__dirname, '../../.env') });
const fetch = require('node-fetch');
const fs = require('fs-extra');
// you need to sign up at bright data to get these values
// https://brightdata.com/cp/setting/users
const CONFIG = {
apiToken: process.env.BRIGHT_DATA_API_TOKEN,
zone: process.env.BRIGHT_DATA_ZONE || 'serp_api1',
maxResults: parseInt(process.env.MAX_RESULTS_PER_ENGINE) || 10,
apiUrl: 'https://api.brightdata.com/request'
};
// search engine config
// as of Jan 2026, Bright data supports Google, Bing, DuckDuckGo, Yandex, Baidu, Yahoo, & Naver
const ENGINES = {
google: {
name: 'Google',
buildUrl: (query) => `https://www.google.com/search?q=${encodeURIComponent(query)}`
},
bing: {
name: 'Bing',
buildUrl: (query) => `https://www.bing.com/search?q=${encodeURIComponent(query)}`
},
duckduckgo: {
name: 'DuckDuckGo',
buildUrl: (query) => `https://duckduckgo.com/?q=${encodeURIComponent(query)}`
},
yandex: {
name: 'Yandex',
buildUrl: (query) => `https://www.yandex.com/search/?text=${encodeURIComponent(query)}`
}
};
// default query categories - all 40 queries from queries.js
const DEFAULT_QUERIES = [
// commercial (10)
{ query: "best password manager 2025", category: "commercial" },
{ query: "buy iphone 15", category: "commercial" },
{ query: "best running shoes", category: "commercial" },
{ query: "cheapest web hosting", category: "commercial" },
{ query: "top rated mattress", category: "commercial" },
{ query: "best credit card for travel", category: "commercial" },
{ query: "cheap flights to europe", category: "commercial" },
{ query: "best laptop for programming", category: "commercial" },
{ query: "top rated wireless earbuds", category: "commercial" },
{ query: "best car insurance rates", category: "commercial" },
// technical (10)
{ query: "react useEffect cleanup", category: "technical" },
{ query: "postgresql connection pooling", category: "technical" },
{ query: "nodejs memory leak debugging", category: "technical" },
{ query: "tailwind vs vanilla css", category: "technical" },
{ query: "docker compose volumes", category: "technical" },
{ query: "kubernetes pod restart policy", category: "technical" },
{ query: "javascript async await best practices", category: "technical" },
{ query: "git rebase vs merge", category: "technical" },
{ query: "typescript generic constraints", category: "technical" },
{ query: "redis cache invalidation strategies", category: "technical" },
// ymyl (10)
{ query: "is coffee bad for you", category: "ymyl" },
{ query: "climate change causes", category: "ymyl" },
{ query: "covid vaccine side effects", category: "ymyl" },
{ query: "401k withdrawal rules", category: "ymyl" },
{ query: "adhd symptoms adults", category: "ymyl" },
{ query: "how to lower cholesterol naturally", category: "ymyl" },
{ query: "social security benefits calculator", category: "ymyl" },
{ query: "is intermittent fasting safe", category: "ymyl" },
{ query: "best investment strategy for retirement", category: "ymyl" },
{ query: "symptoms of diabetes type 2", category: "ymyl" },
// wildcard (10)
{ query: "pizza restaurants chicago", category: "wildcard" },
{ query: "weather in london", category: "wildcard" },
{ query: "ukraine russia conflict timeline", category: "wildcard" },
{ query: "how to start a business", category: "wildcard" },
{ query: "best cities to live in 2025", category: "wildcard" },
{ query: "what is artificial intelligence", category: "wildcard" },
{ query: "how to learn spanish", category: "wildcard" },
{ query: "most popular video games 2025", category: "wildcard" },
{ query: "best new artists 2025", category: "wildcard" },
{ query: "top box office movies 2025", category: "wildcard" }
];
// data directory for saving results
const DATA_DIR = path.join(__dirname, 'data');
const RAW_DIR = path.join(DATA_DIR, 'raw');
// ensure data directory exists
async function setup() {
await fs.ensureDir(RAW_DIR);
}
// fetch results from a single search engine
async function fetchEngine(engineKey, query) {
const engine = ENGINES[engineKey];
if (!engine) {
throw new Error(`Unknown engine: ${engineKey}`);
}
const searchUrl = engine.buildUrl(query);
const isGoogle = engineKey === 'google';
console.log(` [${engine.name}] Fetching: ${query}`);
const requestBody = {
zone: CONFIG.zone,
url: searchUrl
};
if (isGoogle) {
requestBody.format = 'json';
requestBody.data_format = 'parsed_light';
} else {
requestBody.format = 'raw';
requestBody.data_format = 'markdown';
}
try {
const response = await fetch(CONFIG.apiUrl, {
method: 'POST',
headers: {
'Authorization': `Bearer ${CONFIG.apiToken}`,
'Content-Type': 'application/json'
},
body: JSON.stringify(requestBody)
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
}
let data;
if (isGoogle) {
data = await response.json();
console.log(` [${engine.name}] Retrieved JSON data`);
} else {
const markdown = await response.text();
const sizeKB = (markdown.length / 1024).toFixed(2);
console.log(` [${engine.name}] Retrieved markdown (${sizeKB} KB)`);
data = markdown;
}
return {
engine: engineKey,
engineName: engine.name,
query: query,
data: data,
format: isGoogle ? 'json' : 'markdown',
timestamp: new Date().toISOString()
};
} catch (error) {
console.error(` [${engine.name}] ERROR: ${error.message}`);
return {
engine: engineKey,
engineName: engine.name,
query: query,
data: null,
format: null,
error: error.message,
timestamp: new Date().toISOString()
};
}
}
// fetch results from all search engines in parallel
async function fetchAllEngines(query) {
console.log(`\n[Querying all engines for: "${query}"]`);
console.log('-'.repeat(60));
const engineKeys = Object.keys(ENGINES);
const promises = engineKeys.map(engineKey => fetchEngine(engineKey, query));
const results = await Promise.all(promises);
const successCount = results.filter(r => r.data !== null).length;
console.log(`\n[Successfully retrieved data from ${successCount}/${results.length} engines]`);
return results;
}
// save results from all engines to files
async function saveResults(query, category, engineResults) {
const timestamp = Date.now();
const querySlug = query.replace(/[^a-z0-9]/gi, '_').toLowerCase();
// save each engine's results to a separate file
for (const result of engineResults) {
if (result.data) {
const extension = result.format === 'json' ? 'json' : 'md';
const filename = `${result.engine}-${querySlug}-${timestamp}.${extension}`;
const filepath = path.join(RAW_DIR, filename);
if (result.format === 'json') {
await fs.writeJson(filepath, result.data, { spaces: 2 });
} else {
await fs.writeFile(filepath, result.data, 'utf8');
}
console.log(` [Saved ${result.engineName} results]`);
}
}
// save metadata file with query info
const metadataFile = `metadata-${querySlug}-${timestamp}.json`;
const metadataPath = path.join(RAW_DIR, metadataFile);
await fs.writeJson(metadataPath, {
query,
category,
timestamp: new Date().toISOString(),
engines: engineResults.map(r => ({
engine: r.engine,
engineName: r.engineName,
hasData: r.data !== null,
error: r.error || null
}))
}, { spaces: 2 });
console.log(` [Saved metadata]`);
}
// main function
async function main() {
console.log('='.repeat(60));
console.log('Search Engine Comparison Tool');
console.log('='.repeat(60));
// get queries from command line or use defaults
const cliQueries = process.argv.slice(2);
let queries;
if (cliQueries.length > 0) {
// use command-line queries
queries = cliQueries.map(q => ({ query: q, category: 'custom' }));
console.log(`Queries: ${cliQueries.join(', ')}`);
} else {
// use default categorized queries
queries = DEFAULT_QUERIES;
console.log(`Total queries: ${queries.length}`);
}
console.log(`Engines: Google, Bing, DuckDuckGo, Yandex`);
console.log(`Max results per engine: ${CONFIG.maxResults}`);
console.log('='.repeat(60));
// check for API token
if (!CONFIG.apiToken) {
console.error('ERROR: BRIGHT_DATA_API_TOKEN not found in environment');
process.exit(1);
}
// setup directories
await setup();
// process each query
for (let i = 0; i < queries.length; i++) {
const { query, category } = queries[i];
console.log(`\n[${i + 1}/${queries.length}] Processing: "${query}" [${category}]`);
console.log('-'.repeat(60));
try {
// fetch results from all engines
const engineResults = await fetchAllEngines(query);
// save results to files
await saveResults(query, category, engineResults);
const successCount = engineResults.filter(r => r.data !== null).length;
console.log(`\n[Completed: ${successCount}/${engineResults.length} engines succeeded]`);
// wait between queries (except for last one)
if (i < queries.length - 1) {
console.log('Waiting 2 seconds...\n');
await new Promise(resolve => setTimeout(resolve, 2000));
}
} catch (error) {
console.error(`\n[Error processing "${query}":`, error.message);
}
}
console.log('\n' + '='.repeat(60));
console.log('All queries completed!');
console.log(`Results saved in: ${RAW_DIR}`);
console.log('='.repeat(60));
}
// run if called directly
if (require.main === module) {
main().catch(error => {
console.error('FATAL ERROR:', error);
process.exit(1);
});
}
module.exports = { main, CONFIG };
Breaking this down, here’s what I’m doing:
1.Define a fixed set of queries, grouped into broad categories:
- commercial (e.g. product comparisons)
- technical (e.g. programming questions)
- YMYL (health, finance, climate)
- wildcard / everyday queries
2. Build native search URLs for each engine:
- Google (
https://www.google.com/search?q=something) - Bing (
https://www.bing.com/search?q=something) - DuckDuckGo (
https://duckduckgo.com/?q=something) - Yandex (
https://www.yandex.com/search/?text=something)
3. Fetch SERPs directly, without rendering or post-processing:
- Google results are requested in auto-parsed JSON form (via Bright Data SERP API’s parser)
- Bing, DuckDuckGo, and Yandex are retrieved as raw or near-raw markup (Bing does apparently support JSON parsing, but I couldn’t get it to work)
- No attempt is made to normalize rankings across engines. Way beyond the scope of what I wanted to do here.
4. Store each engine’s response separately, along with metadata. I analyzed them afterward by extracting titles, sources, search string used, and search engine used. Dataviz was pretty standard D3.js stuff, afterwards.
What this method captures — and what it doesn’t
This approach captures what each engine is willing to surface by default, given the same query and no user context. It is well-suited to studying:
- domain diversity
- repetition and amplification
- tolerance for aggregators, blogs, and spam
- relative privileging of institutions vs. individuals
It does not capture:
- personalization effects
- geographic fine-tuning beyond the proxy’s exit region
- click behavior or downstream recommendations
- subtle UI elements like knowledge panels or answer boxes
That is intentional. The goal here is not to model the entire user experience, but to compare personality defaults: what each engine treats as acceptable answers when no additional signals are provided.
In that sense, this methodology reflects the engines’ baseline worldviews — how they behave when forced to make decisions about authority, safety, and plurality on their own.
Opinion: There is No Singular “The Web” Anymore
This analysis wasn’t about correctness, bias accusations, or SEO gamesmanship.
What this data makes clear is that “the web” is no longer a single, shared informational substrate. It has fractured at the search-engine layer. If you have access to some sort of SERP infra, all of this is easily reproducible.
Search engines don’t merely reflect reality; they select which parts of reality are visible at all. When only ~2–3% of sources survive that selection process across platforms, the idea of a neutral, universal information commons stops making sense.
Instead, we live in parallel informational worlds:
- Google’s institutional web
- Bing’s consensus web
- DuckDuckGo’s lightly curated web
- Yandex’s wild west, maximalist web
The question “what does the internet say?” no longer has a single answer. It depends entirely on where you look.