Why generate multiple queries?

A single embedding of a user question often misses relevant chunks that are phrased differently. Generating several rewrites — broader, narrower, and keyword variants — and retrieving with all of them widens recall, which is the multi-query retriever pattern popularized by LangChain.

Which providers are supported?

OpenAI chat completions and Anthropic messages, called directly from your browser. Pick the provider that matches the key you paste.

Is my API key stored?

No. The key lives only in the input field in your browser memory and is sent solely to the provider you select. It is never logged or persisted.

How many variants should I generate?

Three to five is a good balance. More queries widen recall but cost more retrievals and more reranking work; beyond five you usually see diminishing returns.

What is the Multi-Query RAG Query Expander (BYO-key)?

Uses your own OpenAI or Anthropic API key to rewrite a single user question into several alternative phrasings, so you can query your vector store from multiple angles and improve RAG recall. The key stays in your browser and is never stored. It runs free in your browser on Gera Tools, with nothing uploaded.

Multi-Query RAG Query Expander (BYO-key)

Name: Multi-Query RAG Query Expander (BYO-key)
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Multi-query RAG query expander

The single biggest, cheapest win for retrieval quality is rarely a better embedding model — it is asking the same question several ways. This tool uses your own API key to turn one user question into a set of complementary retrieval queries (a broad rephrase, a keyword-dense version, a narrower clarification), so your vector store gets queried from multiple angles and surfaces chunks a single embedding would miss.

How it works

You paste your own OpenAI or Anthropic key and a question, and the tool sends a single prompt asking the model to produce N distinct rewrites optimized for semantic search, returned one per line. The request goes directly from your browser to the provider — for Anthropic the call includes the anthropic-dangerous-direct-browser-access header so it works without a proxy. The response is split into individual queries you can copy. In your own pipeline you would embed each query, retrieve top-k chunks per query, then union and de-duplicate the results before reranking.

Why single-query RAG misses relevant chunks

A user question and the relevant document chunk often use different vocabulary. The user asks “how do I cancel my subscription?” and the relevant passage says “to terminate your account billing, visit the account settings page”. Both mean the same thing, but their embedding vectors may be far apart in the semantic space if the model was not trained on that exact pairing.

Multi-query retrieval solves this by generating several phrasings — for example:

Original: “how do I cancel my subscription?”
Broad: “how to end a paid membership?”
Keyword-dense: “cancel subscription account settings billing terminate”
Specific: “steps to stop recurring billing charges”

Each variant retrieves a slightly different set of top-k chunks, and the union covers what any single query would miss.

Building the pipeline around expanded queries

Once you have the query variants:

Embed each query using the same embedding model you used to index your documents.
Retrieve top-k from your vector store for each variant independently.
Union the result sets and remove duplicates (use chunk ID, not text comparison).
Rerank the union with a cross-encoder or reciprocal rank fusion to score relevance against the original question.
Pass the top-N reranked chunks to the LLM as context.

The reranking step is important: without it you are just sending more context tokens, some of which may be tangential. With it, you surface the genuinely relevant chunks that a single-query pipeline would have buried.

Tips and notes

Union, then rerank. Retrieve with every variant, merge the candidate chunks, drop duplicates, and run a single reranking pass — do not just concatenate top-k lists.
Keep one keyword-heavy variant. Pairing a keyword-dense query with a hybrid BM25 retriever catches exact identifiers that pure semantic search drops.
Three to five variants is the sweet spot. More queries widen recall but multiply retrieval cost and reranker load; diminishing returns set in quickly past five.
Your key never leaves the browser. It is used only for the direct provider request and is never stored or logged.