Spot duplicate and drifting prompts in seconds
As a prompt library grows, near-identical prompts pile up and small edits between versions go unnoticed. This checker gives you a single number — the cosine similarity of two prompts — so you can dedupe a library or measure how far a prompt has drifted from its previous version, without any API calls.
How it works
The tool tokenises each prompt (lowercasing, stripping punctuation, dropping one-character tokens), counts term frequencies, and weights each term by its inverse document frequency across the two-prompt corpus. That produces two TF-IDF vectors. The similarity is the cosine of the angle between them:
similarity = (A · B) / (‖A‖ × ‖B‖)
A score near 1.0 means the prompts use the same words in similar proportions; a score near 0 means almost no meaningful overlap. The IDF term is what stops common connective words from inflating the score — if both prompts say “the” it barely moves the needle, but a shared distinctive word like “executive” does.
Tips and limitations
- TF-IDF is lexical, not semantic in the embedding sense. “Summarize this” and “Give me a TL;DR” share few words and will score low even though they mean the same thing.
- For version drift, run the old and new prompt through and watch the score: anything above ~80% is a minor edit, below ~50% is a meaningful rewrite.
- Use the shared-terms list to sanity-check the score — if the overlap is all boilerplate, the prompts may be less similar than the number suggests.