RAG Query Rewriter

Rewrite user questions into better vector search queries

Ad placeholder (leaderboard)

RAG query rewriter

In a retrieval-augmented generation pipeline, the quality of your answer is capped by the quality of your retrieval — and retrieval is capped by the query you embed. A raw user turn like “and what about the cheaper one?” embeds to noise. This tool builds a prompt that rewrites conversational questions into standalone, semantically rich queries that resolve references and use the right domain vocabulary, so your vector store returns the chunks that actually answer the question.

How it works

You provide the user’s question, optionally the recent conversation history, and the domain of your documents. The tool composes a prompt that instructs an LLM to do three things: resolve every pronoun and implicit reference using the history, expand the question with the domain terminology your corpus is likely to use, and output a self-contained query that needs no chat context to make sense. It can also request several alternative phrasings for multi-query retrieval. You run this prompt as the first step of your pipeline, embed the rewritten query, search, and then pass the retrieved chunks plus the original question to your generation model.

Tips and notes

Always feed the rewriter the conversation history when the question contains references — that is the entire point, and without it “the second option” resolves to nothing. Name your domain precisely; “internal HR policy documents” produces a sharper rewrite than “documents.” Consider the multi-query variant for high-recall use cases: embedding three phrasings and merging the hits before re-ranking catches relevant chunks a single query misses. Keep the original user question for the generation step, though — the rewrite is for retrieval, while the answer should still address what the user actually asked in their own framing.

Ad placeholder (rectangle)