Few-Shot Example Quality Rater

Rate your few-shot examples for diversity, clarity, and representativeness

Ad placeholder (leaderboard)

Few-shot example quality rater

Few-shot prompting only works when the examples actually teach the model something. A set of near-identical inputs, lopsided labels, or examples that skip the hard cases will leave the model guessing on real inputs. This rater analyzes your example set and gives you four subscores — input diversity, output variety, label balance, and length consistency — plus an overall grade and specific suggestions, so you can curate examples that generalize instead of ones that just look complete.

How it works

Paste your examples as a JSON array ([{ "input": "...", "output": "..." }]) or as plain-text blocks separated by blank lines. The tool tokenizes each input and measures pairwise overlap to estimate diversity — sets where every input shares most of its words score low because they only demonstrate one pattern. It does the same for outputs, tallies how often each distinct output value appears to judge label balance, and checks whether output lengths swing wildly. Everything runs locally in your browser; no model call is needed.

Tips and examples

  • Spread your inputs. If three of five examples start with the same phrasing, replace one with a structurally different case.
  • Include an edge case. Add at least one example covering the tricky or empty input — that single example often prevents the most common failures.
  • Balance the labels. For yes/no or category tasks, aim for roughly even representation unless the real distribution is genuinely skewed.
  • Keep outputs consistent in shape. If one output is a single word and another is a paragraph, the model gets a mixed signal about expected verbosity.
Ad placeholder (rectangle)