User searches: “How do I fix a leaking kitchen faucet?”
Your vector database contains a perfect document titled “Kitchen Faucet Repair Guide.” But the search returns mediocre results. The helpful document is buried on page two.
Why? Because you’re comparing apples to oranges.
The query is a question. The documents are statements. They exist in the same embedding space, but they’re fundamentally different types of text.
HyDE fixes this with a surprisingly simple idea.
The Query-Document Mismatch
When you embed a query, you get a vector representing a question or search intent.
When you embed documents, you get vectors representing actual content , titles, paragraphs, explanations.
These vectors live in the same space, but they don’t align the way you’d expect.
Example:
Query: “What are the health benefits of drinking green tea?”
Your database has:
- “Green Tea: A Comprehensive Health Analysis”
- “Antioxidants in Green Tea and Their Effects”
- “Why People Love Green Tea: A Buyer’s Guide”
Document 3 might score highest. It’s casual and user-focused, similar to how the query is phrased. But Documents 1 and 2 actually answer the question better.
Embedding similarity captures how text sounds, not what would answer the question.
Another example:
Query: “chronic pain management strategies”
Relevant document: “Managing chronic pain with cognitive behavioral therapy, physical therapy, and medication. This guide covers treatment options, lifestyle changes, and when to see a specialist…”
The query is short and abstract. The document is long and concrete. Same topic, different embedding neighborhoods.
The HyDE Insight
HyDE asks a simple question: what if we made the query look more like a document?
Instead of embedding the raw query, you ask an LLM: “If this query were answered by a document, what would that document say?”
Then you embed the LLM’s response and use that for retrieval.
The flow:
- User submits a query
- LLM generates a hypothetical document that would answer it
- You embed the hypothetical document (not the query)
- Vector search finds similar real documents
- Return results
That’s it. One extra LLM call transforms your retrieval quality.
Why This Works
The hypothetical document is more like your actual documents than the query ever was.
Original query: “How do I fix a leaking kitchen faucet?”
LLM-generated hypothetical:
Kitchen Faucet Leak Repair Guide
A leaking kitchen faucet is one of the most common plumbing issues. Most leaks are caused by worn washers or O-rings. To fix it, turn off the water supply, remove the handle by unscrewing the set screw, replace the washer or O-ring,then reassemble and test. The process takes about 30 minutes…
Now you’re comparing document-to-document. The embedding of this hypothetical will be much closer to your actual repair guides than the original question ever was.
What HyDE solves:
- Query-document mismatch , Questions and documents are now the same type of text
- Vocabulary gaps , User says “broken sink,” LLM generates “leaking faucet”
- Intent capture , The hypothetical contains what the user actually wants to know
- Semantic richness , A paragraph has more signal than a one-line query
Implementation
The code is straightforward. Here’s the core logic:
def retrieve_with_hyde(query, llm, embedder, vector_db, k=10):
# Step 1: Generate hypothetical document
prompt = f"""Write a short document that would answer this query.
Be concrete and informative.
Query: {query}
Document:"""
hypothetical_doc = llm.generate(prompt)
# Step 2: Embed the hypothetical (not the query)
embedding = embedder.encode(hypothetical_doc)
# Step 3: Search with that embedding
results = vector_db.search(embedding, k=k)
return results
Three steps. Generate, embed, search.
The hypothetical document doesn’t need to be perfect or even factually correct. It just needs to sound like the documents in your database. The embedding captures the semantic neighborhood, not the specific facts.
Making It Better
Add re-ranking. HyDE improves recall , you find more relevant documents. But ranking might still be off. A cross-encoder reranker using the original query cleans this up:
# Get more candidates than you need
candidates = retrieve_with_hyde(query, k=30)
# Re-rank with original query
results = reranker.rerank(query, candidates, top_k=10)
The hypothetical finds the right neighborhood. The reranker picks the best documents within it.
Try multiple perspectives. Generate hypothetical documents from different angles , beginner vs expert, technical vs casual. Average the embeddings:
perspectives = [
"Answer this as a beginner would understand:",
"Answer this with technical depth:",
"Answer this practically:"
]
embeddings = []
for perspective in perspectives:
doc = llm.generate(f"{perspective}\n\nQuery: {query}")
embeddings.append(embedder.encode(doc))
final_embedding = np.mean(embeddings, axis=0)
This casts a wider net when you’re not sure what type of document the user needs.
The Trade-offs
What you gain:
- Better retrieval alignment
- Bridges vocabulary gaps
- Captures user intent more fully
- Often 5–15% improvement in retrieval metrics
What it costs:
- One LLM call per query (50–150ms typically)
- One extra embedding call (~5ms)
- Slightly more complexity
The hallucination question: Yes, LLMs can generate incorrect information in the hypothetical document. But it doesn’t matter much , you’re using it for embedding, not for the final answer. A slightly wrong hypothetical still lands in roughly the right semantic neighborhood.
When to Use HyDE
Use it if:
- Your retrieval feels mediocre (relevant docs are buried)
- Your documents have diverse writing styles
- Users phrase queries differently than your content
- You can tolerate ~100ms extra latency
Skip it if:
- Current retrieval is already good
- Speed is critical (real-time, high-throughput)
- Queries are simple keyword lookups
- You’re on a strict LLM budget
The Bottom Line
HyDE solves query-document mismatch by making your query look like a document before embedding it.
The generated hypothetical serves as a bridge. It captures intent. It uses the right vocabulary. It contains enough semantic signal to land in the right neighborhood of your vector space.
One LLM call per query. Potentially double-digit retrieval improvement.
If your RAG retrieval is good, don’t bother. If it’s mediocre, HyDE is one of the highest-impact changes you can make with minimal complexity.
Sometimes the simplest ideas work best.