What is this actually testing?

It tests whether your model blindly trusts retrieved or supplied context even when that context is wrong. A robust, well-grounded model should either correct the false claim, flag the contradiction, or express uncertainty rather than confidently repeating an injected error as fact.

Why does a model repeat injected falsehoods?

Models are trained to be helpful and to ground answers in provided context, so they often defer to whatever the context says even when it conflicts with their internal knowledge. This makes RAG pipelines vulnerable to data-poisoning and to stale or wrong documents in the index.

Yes. Your key lives only in your browser for the duration of the request and is sent directly to the provider's API, never to this site or any intermediary. It is not stored or logged. Use a key scoped to low limits if you want extra caution.

Which providers are supported?

OpenAI and Anthropic. Pick the provider, choose a model, and paste the matching key (sk- for OpenAI, sk-ant- for Anthropic). The request goes straight to the provider's REST endpoint from your browser.

How should I read the result?

The verdict is a heuristic based on whether the injected false value appears in the answer and whether the model used hedging or correction language. Always read the full response yourself — a borderline answer that quietly repeats the falsehood is exactly the failure mode you are hunting for.

What is the Hallucination Injection Tester (BYO-key)?

Inject a factual error into a context document, send it to your own LLM with your API key, and check whether the model echoes the falsehood or correctly flags the contradiction. Tests grounding and robustness in your browser. It runs free in your browser on Gera Tools, with nothing uploaded.

Hallucination Injection Tester (BYO-key)

Name: Hallucination Injection Tester (BYO-key)
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

LLMs are trained to trust the context you give them — which is exactly what makes RAG pipelines vulnerable to wrong or poisoned documents. This tester injects a false premise into a context document, sends it to your own model, and checks whether the model echoes the falsehood or correctly resists it.

The underlying problem: context deference

When an LLM is given a retrieval-augmented context, it is instructed to ground its answers in that context. This is the right default behaviour for most uses — you want the model to use the information you provide, not make things up. But context deference becomes a vulnerability when the context is wrong. A poisoned document in a RAG index, a miscrawled web page, stale data from a knowledge base, or a simple database error can all produce context that is plausibly written but factually incorrect.

A well-calibrated model should detect and flag contradictions between the context and its internal knowledge, especially for well-established facts. In practice, models vary considerably in how faithfully they defer versus how often they push back. This tester gives you a repeatable way to probe that behaviour for a specific model, temperature, and system prompt configuration.

How it works

You provide a true fact, a plausible-but-false alternative, and a question whose answer depends on that fact. The tool builds a short context document stating the false claim, then asks your model the question against that context using your own API key (sent directly to OpenAI or Anthropic from your browser). It inspects the answer for the injected false value and for hedging/correction language, and reports a verdict: echoed the falsehood (bad) or resisted / flagged it (good).

Designing effective test cases

The most revealing tests involve facts where the model has strong internal knowledge but the injected claim is plausible enough to be mistaken for a recent update or a narrow-domain fact:

“The Eiffel Tower is in Berlin” — obviously wrong for a general model, but useful to confirm the baseline
“The capital of [small country] is [nearby city that is not the capital]” — more likely to fool the model
A well-known API returning a fabricated parameter name — good for testing code-assistant deployments
A recent historical date shifted by a year or two — probes the model’s confidence calibration on time-sensitive facts

Running the same test at different temperatures (0.0, 0.5, 1.0) shows how stochastic the model’s resistance is — if it only catches the injection at low temperature, that tells you something about the deployment configuration you should use in production.

Reading the verdict

The verdict is a heuristic — always read the full response yourself, because the dangerous failure mode is an answer that quietly repeats the injected value without flagging it. A phrase like “according to the provided context” or “the document states” is a partial signal — it shows the model is attributing the claim rather than asserting it, which is better than confident endorsement but still means it did not push back. A robust, grounded model should say something like “the provided context states X, but this appears to conflict with the well-established fact that Y.”

Your API key is used only for the direct provider request and is never stored or logged by this tool.