Catch hallucinations in generated summaries
Summarization is one of the most common LLM tasks — and one of the easiest to get subtly wrong. Models invent figures, swap names, or assert things the source never said. This tool sends your source document and the generated summary to your own LLM with a strict grounding rubric and returns a per-claim verdict: supported, unsupported, or contradicted.
How consistency checking works
The model is instructed to decompose the summary into atomic claims — single, verifiable statements — and to judge each one only against the supplied source. A claim is supported when the source directly backs it, unsupported when the source is silent on it (a hallmark of fabrication), and contradicted when the source asserts the opposite. The tool parses these verdicts into a clear list and highlights the contradicted claims, which are the most dangerous. Everything runs through a single direct request to your provider using your own key.
Tips for trustworthy results
- Use a capable model for the judging step (gpt-4o or claude-3-5-sonnet). Claim verification is harder than generation and benefits from a stronger model.
- Keep the source self-contained. If the summary relies on outside knowledge the source doesn’t contain, those claims will correctly read as unsupported.
- Pay closest attention to contradicted claims — those are outright errors, not just omissions.
- Re-run after editing the prompt or summary to confirm your fix actually removed the unsupported or contradicted claims.