Text Chunking Previewer

Visualize how your text splits into RAG chunks before indexing.

Ad placeholder (leaderboard)

Preview your RAG chunks before you index

Before you embed a document into a vector database, it pays to see how it will be cut up. This previewer takes any text, applies your chunk size, overlap and split strategy, and renders a color-coded view of every chunk — including the overlap region shared with the next chunk — so you can tune retrieval quality without spending a single embedding token.

How chunking works

Retrieval-augmented generation (RAG) splits long documents into smaller chunks, embeds each chunk into a vector, and retrieves the closest chunks at query time. Two parameters dominate quality:

  • Chunk size — the target length of each chunk. Smaller chunks retrieve precisely but lose context; larger chunks keep context but can return irrelevant text alongside the answer.
  • Overlap — how much text is repeated between consecutive chunks. Overlap prevents a fact that lands on a boundary from being split across two chunks and lost from both.

The strategy decides where the cuts land. Fixed-character splitting is simplest but can sever sentences. Sentence and paragraph strategies pack text up to the target size while respecting natural boundaries, which usually produces cleaner embeddings.

Tips for better retrieval

  • Start at ~800 characters with 15% overlap and adjust based on how your retrieval answers look.
  • Prefer sentence or paragraph splitting for prose; fixed size is fine for logs or structured text.
  • Watch the chunk count — more chunks means more embedding cost and a larger index, so balance granularity against budget.
  • Inspect the overlap highlights to confirm key facts survive the cut.
Ad placeholder (rectangle)