How does the checker decide if a claim is supported?

It sends the source and summary to your model with instructions to break the summary into atomic claims and judge each one strictly against the source as supported, unsupported, or contradicted.

What is the difference between unsupported and contradicted?

Unsupported means the source neither confirms nor denies the claim (often an invented detail). Contradicted means the source states the opposite, which is a clear factual error.

Is my document sent anywhere besides my LLM provider?

No. The source and summary are sent only to OpenAI or Anthropic for the single evaluation request. Your API key and text are never stored or logged by this tool.

Can it catch every hallucination?

It catches most overt fabrications and contradictions, but no automated check is perfect. Treat the verdicts as a strong first pass and review contradicted claims by hand.

What is the Factual Consistency Checker (BYO-key)?

Free BYO-key factual consistency checker. Paste a source document and an LLM-generated summary, use your own OpenAI or Anthropic key, and get each summary claim labelled supported, unsupported, or contradicted. It runs free in your browser on Gera Tools, with nothing uploaded.

Factual Consistency Checker (BYO-key)

Name: Factual Consistency Checker (BYO-key)
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Catch hallucinations in generated summaries

Summarization is one of the most common LLM tasks — and one of the easiest to get subtly wrong. Models invent figures, swap names, or assert things the source never said. This tool sends your source document and the generated summary to your own LLM with a strict grounding rubric and returns a per-claim verdict: supported, unsupported, or contradicted.

How consistency checking works

The model is instructed to decompose the summary into atomic claims — single, verifiable statements — and to judge each one only against the supplied source. A claim is supported when the source directly backs it, unsupported when the source is silent on it (a hallmark of fabrication), and contradicted when the source asserts the opposite. The tool parses these verdicts into a clear list and highlights the contradicted claims, which are the most dangerous. Everything runs through a single direct request to your provider using your own key.

The three verdict types explained

Understanding what separates the three verdicts matters for knowing what action to take:

Supported — the source contains a sentence or passage that directly backs the claim. The model found the grounding. Safe to keep.
Unsupported — the source is silent on this point. The model invented or extrapolated it. Common when LLMs confidently add context that was not in the original. You need to either find a source or remove the claim.
Contradicted — the source directly says the opposite. This is the most dangerous category: a confident falsehood. Fix or remove immediately.

A typical hallucination pattern looks like this. A source says “the project ran over budget by 12%.” The summary says “the project ran over budget by 20%.” That claim is contradicted, not unsupported, because the source states a different figure — it is an outright factual error, not just an addition.

Worked example of a consistency check

Source excerpt: “The meeting took place on 14 March. Seven members attended. The vote passed 5–2.”

A model-generated summary might produce: “The committee met in mid-March. All members were present and the vote was unanimous.”

Running consistency checking over that summary would flag:

“met in mid-March” — supported (14 March is mid-March)
“All members were present” — contradicted (source says seven attended, not all)
“vote was unanimous” — contradicted (source says 5–2, not unanimous)

Two of three claims are outright errors, despite the fluent tone. This is exactly the kind of subtle falsification that reads fine on a quick skim.

When to use this tool

Checking AI-generated executive summaries against long internal reports before distribution
Validating research abstracts against their full-text sources
Auditing customer-support AI answers against your knowledge base
Confirming legal or compliance summaries before sending to clients
Reviewing any RAG-generated answer that needs to stay grounded in retrieved documents

Tips for trustworthy results

Use a capable model for the judging step (gpt-4o or claude-3-5-sonnet). Claim verification is harder than generation and benefits from a stronger model.
Keep the source self-contained. If the summary relies on outside knowledge the source doesn’t contain, those claims will correctly read as unsupported.
Pay closest attention to contradicted claims — those are outright errors, not just omissions.
Re-run after editing the prompt or summary to confirm your fix actually removed the unsupported or contradicted claims.
For very long sources, break them into sections and run separate checks per section — shorter source windows improve the judge model’s accuracy.
A high ratio of unsupported claims often means the prompt was too vague rather than that the model hallucinated; tightening the system prompt can shift those claims to supported or correctly absent.