What does grounding a prompt mean?

Grounding means giving the model the source text it must answer from, so its response is anchored in your documents rather than its training data. It is the core idea behind retrieval-augmented generation (RAG) and the main defense against hallucination.

How does the compression work?

The tool splits documents into sentences, scores each by how many question keywords it contains plus a small length and position weight, then keeps the highest-scoring sentences in original order until the token budget is reached. It is extractive — it never invents text.

How are tokens estimated?

It uses the common heuristic of roughly four characters per token for English. This is an approximation; exact counts vary by model and tokenizer, so leave some headroom below your true context limit.

Is my document data sent anywhere?

No. All scoring and compression run locally in your browser. Nothing you paste leaves the page.

What is the Grounding Document Injector?

Paste reference documents and a question; the tool scores sentences by relevance to the question, compresses the documents to fit a target token budget, and assembles a grounded RAG-style prompt that keeps the model on-source. It runs free in your browser on Gera Tools, with nothing uploaded.

Grounding Document Injector

Name: Grounding Document Injector
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Grounding document injector

Retrieval-augmented prompting only works if the right context fits in the window. When your reference material is larger than your token budget, you have to choose what to keep — and keeping the sentences most relevant to the question beats truncating from the top. This tool scores every sentence against your question, keeps the highest-value ones until the budget is full, and assembles a grounded prompt that tells the model to answer strictly from that context.

How it works

The injector splits your documents into sentences and scores each one. The score rewards question-keyword overlap most heavily, with smaller weights for sentence length and early position, so on-topic sentences rise to the top regardless of where they sit in the source. It then selects the highest-scoring sentences — preserving their original order so the context still reads coherently — until adding the next would exceed your token budget (estimated at roughly four characters per token). The compression is purely extractive: it only ever keeps your own sentences, never paraphrases. The assembled prompt wraps the kept context in clear delimiters and instructs the model to answer only from it and to flag when the answer is absent.

What grounded prompts prevent

A language model answering without a provided source text draws on its training data. That training data is a frozen snapshot — it can be out of date, incomplete, or simply wrong on the specific details in your document. Grounding forces the model to use your text instead, which produces several benefits:

Accuracy for domain-specific content. Internal policies, product specifications, legal contracts, and technical manuals are not in training data. A grounded prompt is the only way to get accurate answers about them.
Auditability. When the model is told to answer only from provided context, every claim in its response can be traced back to a sentence you supplied and verified.
Hallucination reduction. The “answer only from context, say you cannot find it otherwise” instruction blocks the model from improvising facts. It will hedge or refuse rather than fabricate.

The extractive compression approach

This tool uses extractive compression, meaning it removes sentences rather than rewriting or summarising them. The advantage is fidelity — the context the model sees is word-for-word from your document. The disadvantage is that relevance scoring by keyword overlap can occasionally miss paraphrased or synonymous sentences. If you notice a key fact being dropped, add the key term from that sentence to your question so the scorer elevates it.

Tips and examples

Write a specific question. Keyword overlap drives selection, so a precise question pulls the right sentences; a vague one keeps noise.
Leave headroom under your real limit. The four-chars-per-token estimate is approximate; budget a bit below your model’s true context size.
Raise the budget if key facts get dropped. The kept-vs-removed counts show how aggressive the compression was — loosen it if you cut too much.
Keep the “answer only from context” instruction. It is what turns a pile of text into a grounded prompt that resists hallucination.
Paste the output directly into your LLM interface. The assembled prompt includes your question, the context block, and the grounding instruction — paste it as-is into the model without further edits.