Why number the chunks?

Numbered chunks let the model cite its sources, so you can ask it to answer "using only the context above and citing chunk numbers." This makes answers traceable and reduces hallucination because the model anchors claims to specific retrieved passages.

What separator should I use?

XML-style tags (one tag per chunk) are the most robust because models rarely confuse them with content. Triple-dash and blank-line separators are lighter weight and fine for short, clean chunks. Pick based on how messy your source text is.

Does this call an embedding model or retriever?

No. This tool only formats chunks you have already retrieved from your own vector store or search index. Run your retrieval first, then paste the top-k results here to assemble the final context block.

How accurate is the token estimate?

It uses a roughly four-characters-per-token approximation, which is close for English prose. Use it as a guide to avoid blowing past your context limit; for exact counts, tokenize with your model's tokenizer.

What is the Retrieval Context Formatter?

Paste retrieved text chunks and turn them into a structured context section with numbered chunks, source labels, configurable separators, and a wrapper template, ready to drop into a RAG prompt. Includes a live token estimate. It runs free in your browser on Gera Tools, with nothing uploaded.

Retrieval Context Formatter

Name: Retrieval Context Formatter
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Turn raw retrieved chunks into a clean context block

Retrieval-augmented generation lives or dies on how you present the retrieved passages to the model. Dumping concatenated chunks with no structure invites the model to blur sources together and hallucinate. This formatter takes your top-k chunks and assembles a structured, numbered, citable context block you can paste straight into a prompt.

How it works

You paste one chunk per block, separated by blank lines. If the first line of a block looks like a source label (e.g. Onboarding Guide - Step one is...), it is split out and shown as the chunk’s source. The tool then renders each chunk with your chosen separator — XML tags, triple-dash rules, or numbered headers — and optionally wraps everything in a context template that instructs the model to answer only from the context and cite chunk numbers. A live token estimate tells you whether the block fits your window.

Why context formatting matters more than most builders realize

LLMs are sensitive to how context is presented, not just what it contains. A model that receives five passages concatenated with no separators will often blend them — attributing ideas from passage three to the topic of passage one, or generating a synthesized answer that cannot be traced back to any specific source. Numbered, separated chunks with source labels give the model a structure it can respect and cite.

The practical effect is significant: a well-formatted context block with explicit numbering and a “cite the chunk number you used” instruction in the wrapper produces answers that are easier to verify and more likely to be grounded in the retrieved material. The model has a natural way to reference its source, and you have a way to check it.

Choosing a separator style

The three separator styles serve slightly different use cases:

XML tags ([CHUNK 1]...[/CHUNK 1]) are the most unambiguous. LLMs trained on a mixture of HTML and XML generally parse these tags cleanly, and they are unlikely to appear naturally in your source documents. This is the most robust choice for messy or varied source text.

Triple-dash rules (---) are lightweight and readable. They work well when your source chunks are clean prose that does not contain its own horizontal rules. Less suitable for Markdown-heavy source documents where --- may already appear as front matter or section dividers.

Numbered headers (## Chunk 1:) are the most human-readable format when you are also reviewing the context block yourself before pasting it. They make it easy to count and scan chunks, but they add slightly more tokens than the other formats.

The wrapper template and why it helps

When the wrapper option is enabled, the tool adds a framing instruction around the context block:

Answer the question below using ONLY the context provided.
Cite the chunk number(s) you draw from.
If the answer cannot be found in the context, say so explicitly.

[Context]
[CHUNK 1] ... [/CHUNK 1]
[CHUNK 2] ... [/CHUNK 2]
...

Question: {your question here}

The “answer only from the context” instruction is the key constraint: it asks the model to treat the retrieved passages as its only knowledge source rather than augmenting them with parametric knowledge. This dramatically reduces hallucination in closed-domain QA scenarios, though it will also cause the model to say “I cannot find this in the provided context” when the relevant information was not retrieved — which is the correct behaviour for a traceable system.

Token budget management

The live token estimate uses a rough four-characters-per-token approximation that works well for English prose. Monitor it as you add chunks — if you are approaching your model’s context limit, the right response is to retrieve fewer but more precisely scored chunks, not to truncate chunks mid-passage. A truncated chunk can mislead the model just as badly as no chunk at all, especially if the truncation cuts off a qualifier or a negation.

For precise token counting, pass the formatted block through your model’s tokenizer directly. The estimate here is a planning aid, not an exact count.

Tips for better RAG context

Always number chunks so the model can cite them; ungrounded answers become obvious when no chunk number is referenced.
Keep source labels short and unique — a doc title plus section beats a long URL the model might echo verbatim.
Put the question after the context. Models attend most reliably to instructions that follow the data they operate on.
Watch the token estimate. If you are near the limit, retrieve fewer but higher-scoring chunks rather than truncating mid-passage.