Few-shot example quality rater
Few-shot prompting only works when the examples actually teach the model something. A set of near-identical inputs, lopsided labels, or examples that skip the hard cases will leave the model guessing on real inputs. This rater analyzes your example set and gives you four subscores — input diversity, output variety, label balance, and length consistency — plus an overall grade and specific suggestions, so you can curate examples that generalize instead of ones that just look complete.
How it works
Paste your examples as a JSON array ([{ "input": "...", "output": "..." }]) or
as plain-text blocks separated by blank lines. The tool tokenizes each input and
measures pairwise overlap to estimate diversity — sets where every input shares
most of its words score low because they only demonstrate one pattern. It does
the same for outputs, tallies how often each distinct output value appears to
judge label balance, and checks whether output lengths swing wildly. Everything
runs locally in your browser; no model call is needed.
Tips and examples
- Spread your inputs. If three of five examples start with the same phrasing, replace one with a structurally different case.
- Include an edge case. Add at least one example covering the tricky or empty input — that single example often prevents the most common failures.
- Balance the labels. For yes/no or category tasks, aim for roughly even representation unless the real distribution is genuinely skewed.
- Keep outputs consistent in shape. If one output is a single word and another is a paragraph, the model gets a mixed signal about expected verbosity.