Find the wasted tokens hiding in your context
If you send the same system prompt or retrieved context on every API call, every redundant word costs money and consumes context-window space at scale. This tool scans your text for the three most common sources of bloat — duplicate sentences, repeated phrases, and filler words — and estimates how many tokens you could cut without rewriting the substance.
How it works
Paste your context and the tool runs three passes. First it splits the text into sentences and flags exact or near-exact duplicates. Second it counts repeated multi-word phrases that appear more often than expected. Third it counts common filler words (“very”, “really”, “in order to”, “it is important to note that”, and similar) that rarely add information for a model. It estimates the tokens each category wastes (roughly four characters per token) and reports a compression ratio — the share of tokens you could plausibly remove by trimming alone.
Tips and example
Treat the ratio as a budget, not a guarantee. Removing exact duplicates and filler is almost always safe; trimming repeated phrases can lose nuance, so verify the model still answers correctly afterward. A context with a 25% estimated compressible share, sent on 10,000 calls a day, is a meaningful cost line — that is exactly where a few minutes of editing pays off. For deeper compression, this heuristic is a good first pass before reaching for a learned compressor like LLMLingua.