Prompt Escape Hatch Detector

Find loopholes in your prompt that an LLM could exploit to deviate

Ad placeholder (leaderboard)

Prompt escape hatch detector

LLMs follow the letter of your instructions, not the spirit. Any vagueness becomes a loophole: a hedge word lets the model “mostly” comply, an open conditional with no else-branch lets it improvise, and an undefined scope lets it wander off-task. This tool scans your prompt for those escape hatches and explains how each one could be exploited so you can close it.

How it works

The detector runs a set of heuristics against your prompt text — no model call, no network. It flags hedge verbs (“try to,” “attempt to”), soft modals (“should,” “ideally,” “if possible”), conditionals that never specify the negative branch, vague quantifiers (“some,” “a few,” “appropriate”), undefined references, and the absence of any explicit fallback instruction. Each finding names the offending phrase, categorizes the weakness, and tells you why a model could use it to deviate.

Tips and notes

  • Replace soft modals with imperatives. “You should cite sources” becomes “Cite a source for every claim; if none exists, write ‘no source’.”
  • Always add a fallback. The most reliable prompts say exactly what to do when the main path fails.
  • Define out-of-scope explicitly. State what the model must refuse or redirect, rather than assuming it will infer the boundary.
  • Some hedging is fine. Creative or exploratory prompts may want looseness — the tool flags candidates, you make the call. Everything stays in your browser.
Ad placeholder (rectangle)