Few-Shot Example Diversity Checker

Check if your few-shot examples are diverse enough to avoid bias.

Ad placeholder (leaderboard)

Make sure your prompt examples actually teach the task

Few-shot prompting works because the model imitates the examples you provide. The failure mode is subtle: if your examples are all phrased the same way, the model learns the phrasing instead of the task and breaks on inputs that look different. This checker measures how varied your examples really are — across vocabulary, length, and pairwise overlap — so you can catch redundancy before it biases your prompt.

How it works

Each example is tokenized into a set of words. The tool computes three things. First, lexical diversity: the ratio of unique words to total words across all examples, which tells you how much vocabulary your set covers. Second, length variation: the spread between your shortest and longest example, since uniform lengths push the model toward fixed-length outputs. Third, and most usefully, pairwise Jaccard similarity — for every pair of examples it divides the shared vocabulary by the combined vocabulary and flags any pair above a similarity threshold as too alike.

Tips and notes

Aim for examples that differ in structure, length, and wording while still demonstrating the same task. Two near-identical examples waste a slot and can nudge the model toward that exact pattern, so when a pair is flagged, replace one of them with a genuinely different case — ideally an edge case or a different input shape. Diversity is not the only goal: every example should still be correct and representative. Use this tool as a redundancy filter, not a correctness check. You can paste either plain blocks separated by blank lines or a JSON array of strings; both are parsed automatically.

Ad placeholder (rectangle)