Duplicate Sentence Finder

Find near-duplicate sentences in long LLM outputs.

Ad placeholder (leaderboard)

Long LLM outputs often repeat themselves — the same point restated two paragraphs later, an almost-identical sentence padding out a section. This tool finds those near-duplicates within a single response, groups them, and shows how similar they are, so you can trim the redundancy fast.

How it works

The text is split into sentences and each is normalized (lowercased, whitespace collapsed). The tool then compares every pair using Levenshtein edit distance, converting it into a similarity ratio between 0 (totally different) and 1 (identical). Sentence pairs at or above your chosen threshold are merged into groups, so a cluster of three near-identical sentences shows up together rather than as scattered pairs. Everything runs locally in your browser.

Choosing a threshold

  • 0.95+ — near-identical repeats and copy-paste duplicates.
  • ~0.80 — light paraphrases and reworded restatements.
  • Below 0.70 — starts grouping merely-similar sentences; expect noise.

Because it compares characters rather than meaning, it catches repeated or lightly-edited sentences but not two sentences that express the same idea with entirely different wording.

Tips

  • Start with a high threshold and lower it until the groups stop being actionable.
  • Keep the clearest sentence from each group and delete the rest.
  • Pair this with the Word Frequency analyzer to catch both sentence-level and word-level repetition.
Ad placeholder (rectangle)