What is HyDE and why does it help retrieval?

HyDE — Hypothetical Document Embeddings — has the LLM write a plausible answer to the query, then embeds that answer to search. Because the hypothetical answer is phrased like real documents, its embedding often sits closer to relevant passages than the question's embedding does.

Does the hypothetical answer need to be factually correct?

No. It is never shown to the user and only used to steer retrieval. Even a partially wrong answer usually carries the right vocabulary and structure to find relevant documents.

How should I use the output?

Embed the generated document with your normal embedding model and run nearest-neighbour search against your vector store, instead of embedding the raw query. You can also concatenate it with the query for a hybrid signal.

Is my API key stored?

No. The key lives only in your browser and is sent directly to OpenAI in the one generation request. Nothing is logged or sent to Gera.

What is the HyDE Query Generator (BYO-key)?

HyDE improves retrieval by embedding a hypothetical answer instead of the raw question. This tool uses your OpenAI key to draft that hypothetical document from your query and domain context, ready to embed and search against your vector store. It runs free in your browser on Gera Tools, with nothing uploaded.

HyDE Query Generator (BYO-key)

Name: HyDE Query Generator (BYO-key)
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

HyDE query generator

Short questions make poor embeddings: a five-word query rarely sits near the dense, document-shaped passages you want to retrieve. HyDE fixes this by asking an LLM to write a hypothetical answer first, then embedding that answer. This tool drafts the hypothetical document from your query and domain context using your own OpenAI key.

How it works

On generate, the tool sends one request to https://api.openai.com/v1/chat/completions asking the model to write a concise, confident passage that answers your query as if it were an excerpt from a real document in your domain. The domain context you provide is added to the prompt so the vocabulary and tone match your corpus. The request goes directly from your browser to OpenAI; copy the result and embed it in place of the raw query.

Why HyDE improves retrieval

In a standard RAG pipeline, you embed the user’s query and find the nearest stored chunks by cosine similarity. The problem is a vocabulary and length mismatch: the query “what is the refund policy for annual subscriptions?” might be six words, while the relevant policy passage is 150 words using terms like “proration”, “billing cycle”, and “service credit”. Those two embeddings are often not as close as you would want.

HyDE bridges this by generating a passage that looks like the document you want to find — similar vocabulary, similar structure, similar length — and using that as the retrieval query. The hypothetical passage and real passages from your corpus end up much closer in the embedding space.

The information does not need to be correct

A key insight from the original HyDE paper is that the hypothetical answer does not need to be factually accurate. The LLM might get the specific policy wrong, but if it writes “annual subscriptions are refunded on a prorated basis for the remaining months in the billing cycle”, that phrase structure and vocabulary will retrieve the correct policy passage even if the actual policy differs. Factual grounding comes from the real retrieved chunks, not from the hypothesis.

Practical workflow

Enter your user query and an optional domain context (for example: “internal HR policies for a UK fintech company”).
Generate the hypothetical document.
Embed the hypothetical document using your normal embedding model.
Run nearest-neighbour search against your vector store using that embedding.
Feed the retrieved chunks and the original query to your LLM for the final answer.

You can also test embedding the concatenation of the original query and the hypothetical document — this hybrid approach sometimes outperforms either alone on recall benchmarks.

Tips and notes

Match the document style. Telling the model the corpus is “internal HR policy documents” yields a hypothetical that retrieves better than a generic answer.
Keep it the length of a real chunk. A hypothetical document close in size to your indexed chunks tends to embed nearest to them. If your chunks are 200 tokens, aim for a 200-token hypothesis.
It can be wrong and still work. Retrieval cares about lexical and semantic shape, not truth.
Your key stays in your browser. The API call goes from your browser directly to OpenAI. Nothing is logged or sent to Gera servers.