Paraphrase Similarity Checker

Check if two LLM responses say the same thing in different words.

Ad placeholder (leaderboard)

What this checks

Sometimes you regenerate an LLM response and want to know whether the new answer actually differs from the old one, or whether two model outputs are saying the same thing in different words. This tool measures the lexical similarity between two texts — how much vocabulary and phrasing they share — and gives you a single composite score plus a word-level breakdown.

It is a fast, deterministic, in-browser check. It does not call a model and does not claim to understand meaning.

How it works

The texts are lowercased and tokenised into words. The tool then computes three complementary signals:

  • Unigram overlap — the Jaccard similarity of the two word sets: how much of the combined vocabulary appears in both texts.
  • Bigram overlap — the same measure on adjacent word pairs, which captures shared phrasing and word order, not just shared words.
  • BLEU — a smoothed, bidirectional sentence-level BLEU score over 1- to 4-grams with a brevity penalty, the standard surface metric from machine translation.

These are blended into one composite percentage and turned into a verdict: likely paraphrases, partial overlap, or likely different content. The word lists below show exactly which terms are shared and which are unique to each side.

How to read it — and its limits

Lexical metrics are a starting point, not the truth. Two important failure modes:

  • A real paraphrase that swaps in synonyms (“The feline rested on the ledge”) shares few words with the original and will score lower than it should.
  • Two texts on the same topic but with opposite claims (“profits rose” vs “profits fell”) share most of their words and will score high despite meaning the reverse.

So use the score to triage — high scores are worth a closer read for redundancy, low scores confirm genuine divergence — but make the final call by reading both texts. For meaning-level similarity, you need embeddings and cosine similarity, not word overlap.

Ad placeholder (rectangle)