LLM output deduplicator
LLMs frequently pad answers by restating the same point in slightly different words — “In summary…”, “To reiterate…”, “As mentioned above…”. This tool finds those near-duplicate sentences and removes them, keeping the first occurrence of each idea and showing you exactly what it dropped and which sentence it matched.
How it works
The text is split into sentences on terminal punctuation. Each sentence is normalized to lowercase and reduced to a set of word bigrams (overlapping two-word sequences), which captures phrasing better than single words. Sentences are then compared pairwise using the Jaccard index — the size of the intersection of their bigram sets divided by the size of the union. The first sentence is always kept; each later sentence is removed only if its similarity to an already-kept sentence meets or exceeds your threshold. Removed sentences are listed alongside the sentence they duplicated.
Tips and notes
- Start at 0.7. It removes obvious restatements while preserving distinct points; adjust from there based on the output you see.
- Order is preserved. The first time an idea appears it is kept; only later echoes are removed, so the logical flow stays intact.
- It never rephrases. Kept sentences are returned word-for-word, so the result is always a faithful subset of your input.
- Local only. No text leaves your browser, making it safe for private or proprietary content.