Multi-query RAG query expander
The single biggest, cheapest win for retrieval quality is rarely a better embedding model — it is asking the same question several ways. This tool uses your own API key to turn one user question into a set of complementary retrieval queries (a broad rephrase, a keyword-dense version, a narrower clarification), so your vector store gets queried from multiple angles and surfaces chunks a single embedding would miss.
How it works
You paste your own OpenAI or Anthropic key and a question, and the tool sends a
single prompt asking the model to produce N distinct rewrites optimized for
semantic search, returned one per line. The request goes directly from your
browser to the provider — for Anthropic the call includes the
anthropic-dangerous-direct-browser-access header so it works without a proxy.
The response is split into individual queries you can copy. In your own pipeline
you would embed each query, retrieve top-k chunks per query, then union and
de-duplicate the results before reranking.
Tips and notes
- Union, then rerank. Retrieve with every variant, merge the candidate chunks, drop duplicates, and run a single reranking pass — do not just concatenate top-k lists.
- Keep one keyword-heavy variant. Pairing a keyword-dense query with a hybrid BM25 retriever catches exact identifiers that pure semantic search drops.
- Mind the cost. Each variant multiplies your retrieval calls; three to five variants is the usual sweet spot.
- Your key never leaves the browser. It is used only for the direct provider request and is never stored or logged.