Which chunking strategy is best?

It depends on your documents. Fixed-size is predictable and simple, sentence-boundary preserves meaning within a chunk, and paragraph chunking keeps related ideas together but produces uneven sizes. Try each and compare retrieval quality.

What overlap fraction should I use?

A 10 to 20 percent overlap is a common starting point. Overlap helps the retriever catch passages that straddle a chunk boundary, but too much overlap duplicates tokens and wastes budget, so tune it against your data.

How are words converted to tokens?

The calculator uses the common heuristic of about 1.3 tokens per English word. Your real ratio depends on the tokenizer and language, so treat the token figures as planning estimates.

Does this send my documents anywhere?

No. Everything runs locally in your browser. You only enter sizes and settings, not the document text, and nothing is uploaded or stored.

What is the Chunking Strategy Calculator?

Compare fixed-size, sentence-boundary, and paragraph chunking strategies for a given context window. See chunk count, overlap waste, and how many chunks fit per retrieval call for your RAG pipeline. It runs free in your browser on Gera Tools, with nothing uploaded.

Chunking Strategy Calculator

Name: Chunking Strategy Calculator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Choosing how to split documents is the most consequential decision in a RAG pipeline. Chunk too large and retrieval returns irrelevant filler; chunk too small and you lose context and pay for excessive overlap. This calculator compares the three most common strategies — fixed-size, sentence-boundary, and paragraph — on the same document so you can see the trade-offs in concrete numbers.

How it works

The calculator first converts your document’s word count to tokens using a 1 word ≈ 1.3 tokens estimate. Each strategy assumes a typical chunk size: fixed-size chunks are the most uniform, sentence-boundary chunks are slightly smaller because they round to natural breaks, and paragraph chunks are larger and more variable.

It then applies your overlap fraction. Overlap repeats tokens from the end of one chunk at the start of the next, which improves recall at boundaries but duplicates tokens. The result is a chunk count, the tokens wasted to overlap, and how many of those chunks fit inside one retrieval call for your selected model’s context window.

The three strategies compared

Fixed-size chunking

Split every N tokens, regardless of sentence or paragraph boundaries. The chunk count is predictable and indexing is fast because no natural-language parsing is needed. The downside is that sentences are cut mid-thought, which can hurt the retriever’s ability to match the chunk to a query about the idea that straddles the cut.

Best for: large uniform corpora (logs, structured data dumps, legal clause libraries) where you control document structure and can ensure sentences are short.

Sentence-boundary chunking

Split at sentence-ending punctuation and accumulate sentences until the target token budget is reached. This avoids splitting individual sentences, so each chunk is a coherent unit. The average chunk size is slightly smaller and less predictable than fixed-size because sentence lengths vary.

Best for: conversational text, FAQ documents, news articles, and any content where a single sentence is often the retrievable unit of meaning.

Paragraph chunking

Split at paragraph boundaries (double newlines or heading transitions) and keep each paragraph as one or more chunks. Paragraphs preserve a complete thought or topic. The chunk size is highly variable — a one-sentence paragraph and a ten-sentence one are both “one paragraph” — so you need a maximum-size cap to prevent very long paragraphs from exceeding the model’s context limit.

Best for: structured documents like reports, documentation, and academic papers where each paragraph has a clear, coherent topic.

Overlap: how much is enough?

Overlap is expressed as a fraction of the chunk size. An overlap of 0.1 (10%) means the last 10% of each chunk’s tokens are repeated at the start of the next chunk. Common guidance:

5–10%: Minimal overlap, suitable when documents are well-structured and sentences rarely straddle chunk boundaries.
10–20%: Standard starting point for most RAG pipelines. Provides meaningful boundary coverage without excessive duplication.
20–30%+: Use only when your documents have very long sentences or dense cross-reference patterns. The storage and embedding cost grows quickly.

At scale, 20% overlap on a 1 million token corpus means you embed and store 200,000 additional tokens — a meaningful increase in vector store size and embedding API cost.

How many chunks fit in a retrieval call?

Each RAG query typically retrieves a fixed number of chunks (e.g. the top-5 or top-10 most semantically similar chunks) and injects them into the model’s context window. The calculator shows how many of your chunks fill the selected model’s context, which helps you plan:

If each chunk is 300 tokens and the context is 128,000 tokens, you can inject up to ~400 chunks — far more than you would ever want.
If each chunk is 3,000 tokens and the context is 8,000 tokens, you can only inject 2 chunks — too few to provide diverse coverage.

A good target is chunks sized so you can comfortably fit 5–20 high-quality retrieved passages without hitting the context limit.

Tips for calibrating against your data

Match chunk size to your queries. Fact-lookup questions favour small, precise chunks; summarisation and reasoning favour larger chunks that keep context intact.
Run retrieval quality checks before tuning. The numbers here are planning estimates. The real test is whether your retriever surfaces the right passages for representative queries.
Sentence and paragraph chunking produce uneven sizes. The figures here are typical averages; in practice some chunks will be much larger than others, so cap maximum chunk size in your splitter.