Grounding Document Injector

Inject and compress reference documents into a prompt within a token budget

Ad placeholder (leaderboard)

Grounding document injector

Retrieval-augmented prompting only works if the right context fits in the window. When your reference material is larger than your token budget, you have to choose what to keep — and keeping the sentences most relevant to the question beats truncating from the top. This tool scores every sentence against your question, keeps the highest-value ones until the budget is full, and assembles a grounded prompt that tells the model to answer strictly from that context.

How it works

The injector splits your documents into sentences and scores each one. The score rewards question-keyword overlap most heavily, with smaller weights for sentence length and early position, so on-topic sentences rise to the top regardless of where they sit in the source. It then selects the highest-scoring sentences — preserving their original order so the context still reads coherently — until adding the next would exceed your token budget (estimated at roughly four characters per token). The compression is purely extractive: it only ever keeps your own sentences, never paraphrases. The assembled prompt wraps the kept context in clear delimiters and instructs the model to answer only from it and to flag when the answer is absent.

Tips and examples

  • Write a specific question. Keyword overlap drives selection, so a precise question pulls the right sentences; a vague one keeps noise.
  • Leave headroom under your real limit. The four-chars-per-token estimate is approximate; budget a bit below your model’s true context size.
  • Raise the budget if key facts get dropped. The kept-vs-removed counts show how aggressive the compression was — loosen it if you cut too much.
  • Keep the “answer only from context” instruction. It is what turns a pile of text into a grounded prompt that resists hallucination.
Ad placeholder (rectangle)