How does overlap affect the number of chunks?

Overlap means consecutive chunks share tokens, so each new chunk advances by chunk size minus overlap (the stride). A smaller stride produces more chunks and more total embedded tokens, increasing both cost and recall.

What chunk size should I use?

There is no universal answer, but 256 to 512 tokens with 10 to 20 percent overlap is a common starting point for prose. Smaller chunks improve precision but lose context; larger chunks preserve context but dilute relevance. Tune against your own evaluation set.

Why does the context window matter here?

At query time you retrieve the top-k chunks and stuff them into the model's prompt alongside the question. The calculator shows how many chunks of your chosen size fit in the context window so you can pick a sensible top-k without overflowing.

Is anything sent to a server?

No. All arithmetic runs locally in your browser. No documents or numbers are uploaded or stored.

What is the RAG Chunk Size & Overlap Calculator?

Enter your document length, chosen chunk size, and overlap percentage to calculate how many chunks you will produce, the overlap in tokens, and how many chunks fit in your model's context window during retrieval. It runs free in your browser on Gera Tools, with nothing uploaded.

RAG Chunk Size & Overlap Calculator

Name: RAG Chunk Size & Overlap Calculator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

RAG chunk size and overlap calculator

Chunking is the quiet decision that makes or breaks a retrieval-augmented generation pipeline. Chunk too large and retrieval returns diluted, low-relevance passages; chunk too small and you lose the surrounding context the model needs. Overlap helps preserve context across boundaries but multiplies your token count and cost. This calculator turns those tradeoffs into concrete numbers for your specific document length.

How it works

Given a document length in tokens, a chunk size, and an overlap percentage, the tool computes the stride — chunk size minus overlap tokens — which is how far each chunk advances. The chunk count is the document length divided by the stride, rounded up. From there it derives the overlap in tokens, the total number of tokens you will actually embed (chunk count × chunk size, which exceeds the document length because of overlap), and the overlap ratio. It also divides your retrieval context window by the chunk size to show how many chunks you can pack into a single prompt at query time.

Worked example

Suppose you have a 10,000-token document, a chunk size of 512, and 15% overlap:

Overlap tokens per chunk: 0.15 × 512 = 77 tokens
Stride: 512 − 77 = 435 tokens
Chunk count: ⌈10,000 / 435⌉ = 23 chunks
Total embedded tokens: 23 × 512 = 11,776 (18% more than the raw document)

If your retrieval context window is 4,096 tokens, you can fit ⌊4,096 / 512⌋ = 8 chunks per prompt — a sensible top-k ceiling. Setting top-k to 12 would overflow the window, either truncating context silently or causing an error.

What affects the results

Chunk size is the dominant lever. Larger chunks preserve more paragraph-level context and make retrieval robust to broadly-worded queries. Smaller chunks improve precision when users ask about specific facts buried in long documents. The sweet spot depends heavily on how your source material is written: dense technical specs benefit from smaller chunks (128–256 tokens), while narrative prose or legal documents often work better at 512–1024.

Overlap percentage is a cost knob. Higher overlap smooths boundary effects — a key sentence that falls at the end of one chunk will appear near the start of the next — but it raises your total embedded token count proportionally. At 25% overlap you are embedding roughly one-third more tokens than the raw document size, which matters at scale.

Document length scales everything linearly. For a 100,000-token corpus at the same settings, the numbers above simply multiply by ten.

Tips and notes

Mind the embed-cost multiplier. Total embedded tokens grow as overlap rises — 20% overlap embeds roughly 25% more tokens than the raw document.
Match top-k to the context window. If only eight chunks fit in your retrieval window, retrieving twelve wastes tokens or truncates context.
Start at 512 / 15%. A reasonable default for prose; shrink chunks for fact-dense data like tables or specs, grow them for narrative text.
Evaluate, do not guess. Use these numbers to set up experiments, then measure retrieval quality on a labelled set.
Recalculate when you change models. Context window size varies widely — an 8k window and a 128k window support very different top-k values at any given chunk size.