Instruction-Following Scorer

Paste an output and score how well the model followed your prompt

Ad placeholder (leaderboard)

Instruction-following scorer

When a model output looks “mostly right” it is easy to miss the one rule it ignored — the length cap it blew past, the format it dropped, the word you told it to avoid. This scorer pulls the explicit instructions out of your prompt, checks the output against each one, and gives you a compliance percentage so drift is visible instead of buried in a wall of plausible text.

How it works

The tool first extracts candidate instructions: imperative sentences, numbered rules, and bullet points — the lines that actually tell the model what to do. It then runs lightweight heuristics on the output for each one: word and character limits, format checks (JSON, bullet lists, headings), and forbidden-word checks for “do not” rules. Each instruction gets a pass, partial, or fail verdict that you can override, because semantic instructions like “be concise” need a human eye. The compliance percentage updates live as you adjust.

Tips and notes

  • Prune the extracted list first. Removing non-instructions keeps the score meaningful.
  • Trust the mechanical checks, judge the rest. Length and format verdicts are reliable; tone and relevance are yours to set.
  • Use it to compare models. Score the same prompt across two models and the percentages give you a fast, concrete comparison.
  • Everything is local. Paste proprietary prompts and outputs freely — nothing leaves your browser.
Ad placeholder (rectangle)