Chunk overlap visualizer
Sliding-window chunking is the default for RAG, but the overlap is invisible until something goes wrong in retrieval. This tool splits your text by chunk size and overlap and highlights exactly which words appear in two adjacent chunks, so you can see the redundancy and the boundary coverage before you spend embeddings on it.
How it works
Your text is tokenised by whitespace into approximate tokens. Starting at index
0, the tool takes chunkSize tokens as the first chunk, then advances by
chunkSize − overlap tokens for each subsequent chunk, exactly as a standard
sliding-window splitter does. Each chunk is rendered with its token range, and
the tokens it shares with the previous chunk (its leading overlap) and the next
chunk (its trailing overlap) are highlighted so the duplicated regions stand
out. A summary reports the chunk count and total token duplication.
Tips and notes
- Watch the duplication total. Large overlap relative to chunk size inflates your index size and retrieval cost without much quality gain.
- Keep overlap below the chunk size. Overlap must be smaller than the chunk, or the window cannot advance; the tool clamps this for you.
- Boundaries are where context is lost. If a key sentence keeps landing on a cut, raise the overlap a little so it survives whole in one chunk.
- Word tokens are an estimate. For precise budgeting, confirm counts with a real tokenizer for your model; this view is for shaping the strategy.