User searches: “How do I fix a leaking kitchen faucet?”

Your vector database contains a perfect document titled “Kitchen Faucet Repair Guide.” But the search returns mediocre results. The helpful document is buried on page two.

Why? Because you’re comparing apples to oranges.

The query is a question. The documents are statements. They exist in the same embedding space, but they’re fundamentally different types of text.

HyDE fixes this with a surprisingly simple idea.

The Query-Document Mismatch

When you embed a query, you get a vector representing a question or search intent.

When you embed documents, you get vectors representing actual content , titles, paragraphs, explanations.

These vectors live in the same space, but they don’t align the way you’d expect.

Example:

Query: “What are the health benefits of drinking green tea?”

Your database has:

“Green Tea: A Comprehensive Health Analysis”
“Antioxidants in Green Tea and Their Effects”
“Why People Love Green Tea: A Buyer’s Guide”

Document 3 might score highest. It’s casual and user-focused, similar to how the query is phrased. But Documents 1 and 2 actually answer the question better.

Embedding similarity captures how text sounds, not what would answer the question.

Another example:

Query: “chronic pain management strategies”

Relevant document: “Managing chronic pain with cognitive behavioral therapy, physical therapy, and medication. This guide covers treatment options, lifestyle changes, and when to see a specialist…”

The query is short and abstract. The document is long and concrete. Same topic, different embedding neighborhoods.

The HyDE Insight

HyDE asks a simple question: what if we made the query look more like a document?

Instead of embedding the raw query, you ask an LLM: “If this query were answered by a document, what would that document say?”

Then you embed the LLM’s response and use that for retrieval.

The flow:

User submits a query
LLM generates a hypothetical document that would answer it
You embed the hypothetical document (not the query)
Vector search finds similar real documents
Return results

That’s it. One extra LLM call transforms your retrieval quality.

Why This Works

The hypothetical document is more like your actual documents than the query ever was.

Original query: “How do I fix a leaking kitchen faucet?”

LLM-generated hypothetical:

Kitchen Faucet Leak Repair Guide

A leaking kitchen faucet is one of the most common plumbing issues. Most leaks are caused by worn washers or O-rings. To fix it, turn off the water supply, remove the handle by unscrewing the set screw, replace the washer or O-ring,then reassemble and test. The process takes about 30 minutes…

Now you’re comparing document-to-document. The embedding of this hypothetical will be much closer to your actual repair guides than the original question ever was.

What HyDE solves:

Query-document mismatch , Questions and documents are now the same type of text
Vocabulary gaps , User says “broken sink,” LLM generates “leaking faucet”
Intent capture , The hypothetical contains what the user actually wants to know
Semantic richness , A paragraph has more signal than a one-line query

Implementation

The code is straightforward. Here’s the core logic:

def retrieve_with_hyde(query, llm, embedder, vector_db, k=10):
    # Step 1: Generate hypothetical document
    prompt = f"""Write a short document that would answer this query.
Be concrete and informative.

Query: {query}

Document:"""
    
    hypothetical_doc = llm.generate(prompt)
    
    # Step 2: Embed the hypothetical (not the query)
    embedding = embedder.encode(hypothetical_doc)
    
    # Step 3: Search with that embedding
    results = vector_db.search(embedding, k=k)
    
    return results

Three steps. Generate, embed, search.

The hypothetical document doesn’t need to be perfect or even factually correct. It just needs to sound like the documents in your database. The embedding captures the semantic neighborhood, not the specific facts.

Making It Better

Add re-ranking. HyDE improves recall , you find more relevant documents. But ranking might still be off. A cross-encoder reranker using the original query cleans this up:

# Get more candidates than you need
candidates = retrieve_with_hyde(query, k=30)

# Re-rank with original query
results = reranker.rerank(query, candidates, top_k=10)

The hypothetical finds the right neighborhood. The reranker picks the best documents within it.

Try multiple perspectives. Generate hypothetical documents from different angles , beginner vs expert, technical vs casual. Average the embeddings:

perspectives = [
    "Answer this as a beginner would understand:",
    "Answer this with technical depth:",
    "Answer this practically:"
]

embeddings = []
for perspective in perspectives:
    doc = llm.generate(f"{perspective}\n\nQuery: {query}")
    embeddings.append(embedder.encode(doc))

final_embedding = np.mean(embeddings, axis=0)

This casts a wider net when you’re not sure what type of document the user needs.

The Trade-offs

What you gain:

Better retrieval alignment
Bridges vocabulary gaps
Captures user intent more fully
Often 5–15% improvement in retrieval metrics

What it costs:

One LLM call per query (50–150ms typically)
One extra embedding call (~5ms)
Slightly more complexity

The hallucination question: Yes, LLMs can generate incorrect information in the hypothetical document. But it doesn’t matter much , you’re using it for embedding, not for the final answer. A slightly wrong hypothetical still lands in roughly the right semantic neighborhood.

When to Use HyDE

Use it if:

Your retrieval feels mediocre (relevant docs are buried)
Your documents have diverse writing styles
Users phrase queries differently than your content
You can tolerate ~100ms extra latency

Skip it if:

Current retrieval is already good
Speed is critical (real-time, high-throughput)
Queries are simple keyword lookups
You’re on a strict LLM budget

The Bottom Line

HyDE solves query-document mismatch by making your query look like a document before embedding it.

The generated hypothetical serves as a bridge. It captures intent. It uses the right vocabulary. It contains enough semantic signal to land in the right neighborhood of your vector space.

One LLM call per query. Potentially double-digit retrieval improvement.

If your RAG retrieval is good, don’t bother. If it’s mediocre, HyDE is one of the highest-impact changes you can make with minimal complexity.

Sometimes the simplest ideas work best.

101. Hypothetical Document Embeddings The Simple Trick That Makes RAG Retrieval Actually Work: HyDE

The Query-Document Mismatch

The HyDE Insight

Why This Works

Implementation

Making It Better

The Trade-offs

When to Use HyDE

The Bottom Line

Promote your content

Join our developer community

Main Menu

101. Hypothetical Document Embeddings The Simple Trick That Makes RAG Retrieval Actually Work: HyDE

The Query-Document Mismatch

The HyDE Insight

Why This Works

Implementation

Making It Better

The Trade-offs

When to Use HyDE

The Bottom Line

Promote your content

Join our developer community