What does the percentage actually mean?

It is the cosine similarity of the two prompts' TF-IDF vectors, scaled to 0-100%. 100% means identical word distributions; 0% means no shared meaningful words.

Why TF-IDF instead of embeddings?

TF-IDF needs no API key, no model download and no network — it runs instantly in your browser. It is excellent for catching lexical near-duplicates, though it does not understand synonyms the way embeddings do.

Does it ignore common words?

It downweights words that appear in both prompts via the inverse-document-frequency term and ignores very short tokens, so filler words contribute little to the score.

Yes. Both prompts are tokenised and compared entirely on your device. Nothing is sent to a server, logged or stored.

What is the Prompt Semantic Similarity Checker?

Measure the cosine similarity between two prompts using client-side TF-IDF to detect near-duplicate prompts in your library or quantify prompt drift between versions — all in your browser, nothing uploaded. It runs free in your browser on Gera Tools, with nothing uploaded.

Prompt Semantic Similarity Checker

Name: Prompt Semantic Similarity Checker
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Spot duplicate and drifting prompts in seconds

As a prompt library grows, near-identical prompts pile up and small edits between versions go unnoticed. This checker gives you a single number — the cosine similarity of two prompts — so you can dedupe a library or measure how far a prompt has drifted from its previous version, without any API calls.

Two problems this tool solves

Library deduplication. When a team maintains a collection of prompts for different purposes, it is common for two prompts that started with different goals to converge over time into near-identical instructions. One person adjusts prompt A to be more concise; someone else adjusts prompt B toward the same outcome. A similarity score above 80% is a strong signal that two prompts should be merged or that one has become redundant.

Version drift detection. When a prompt is maintained over months, it changes through many small edits. Comparing the current version to the one deployed three months ago tells you whether the prompt has drifted significantly or stayed stable. This is useful when trying to diagnose why a model’s output quality has changed — determining whether the model itself changed or the prompt changed first requires knowing how different the prompts actually are.

How it works

The tool tokenises each prompt (lowercasing, stripping punctuation, dropping one-character tokens), counts term frequencies, and weights each term by its inverse document frequency across the two-prompt corpus. That produces two TF-IDF vectors. The similarity is the cosine of the angle between them:

similarity = (A · B) / (‖A‖ × ‖B‖)

A score near 1.0 means the prompts use the same words in similar proportions; a score near 0 means almost no meaningful overlap. The IDF term is what stops common connective words from inflating the score — if both prompts say “the” it barely moves the needle, but a shared distinctive word like “executive” does.

Reading the score

Score range	Interpretation
90–100%	Near-identical — likely a duplicate or a trivial edit
70–90%	High overlap — same core instructions, minor wording differences
50–70%	Moderate overlap — significant edit or partial rewrite
Below 50%	Low overlap — largely different prompts

Tips and limitations

TF-IDF is lexical, not semantic in the embedding sense. “Summarize this” and “Give me a TL;DR” share few words and will score low even though they mean the same thing.
For version drift, run the old and new prompt through and watch the score: anything above ~80% is a minor edit, below ~50% is a meaningful rewrite.
Use the shared-terms list to sanity-check the score — if the overlap is all boilerplate, the prompts may be less similar than the number suggests.
Compare the shared-terms breakdown to see whether the overlap comes from meaningful instructions or just common filler words, which affects how seriously to take a high score.