What is map-reduce summarization?

The document is split into chunks that each fit the context window. The map stage summarizes every chunk, then the reduce stage combines those summaries — recursively if needed — until a single final summary fits in one call. It is the standard pattern for documents larger than the context window.

Why are there multiple merge passes?

If you have many chunk summaries, they may themselves exceed the context window, so they are merged in groups, and those merged summaries are merged again. The planner computes how many recursive reduce passes your chunk count requires.

How can I make this cheaper?

Use a cheaper model for the map stage and reserve a stronger model only for the final reduce, increase chunk size to reduce the number of map calls, or pre-filter irrelevant sections so you summarize less text overall.

Does this call any API?

No. The planner computes everything locally in your browser from the sizes you enter. Nothing is uploaded, stored, or logged.

What is the Summarization Pipeline Cost Planner?

Plan a map-reduce summarization run for documents that exceed the context window. See how many chunks and merge passes are needed, the tokens processed at each stage, and the total cost to summarize the whole document. It runs free in your browser on Gera Tools, with nothing uploaded.

Summarization Pipeline Cost Planner

Name: Summarization Pipeline Cost Planner
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Summarization pipeline cost planner

When a document is larger than the model’s context window, you cannot summarize it in one call — you run a map-reduce pipeline: summarize each chunk, then combine the summaries, recursively, until one final summary remains. The cost is easy to underestimate because the merge passes add up. This planner computes the chunk count, the recursive merge passes, and the total cost for the whole run.

How it works

Your document token count divided by the chunk size gives the number of map chunks. Each map call reads a chunk (input) and writes a per-chunk summary (output) at the target summary length. The chunk summaries are then combined in a reduce stage: if they do not all fit in one call, they are merged in groups and re-merged recursively, and the planner counts those merge passes. Everything is priced at the selected model’s input and output rates to produce a total pipeline cost.

Why the merge passes are easy to underestimate

A 100,000-token document split into 4,000-token chunks produces 25 map calls. So far so good. But those 25 summaries — say 300 tokens each, totalling 7,500 tokens — may not all fit into one reduce call either. If your reduce window is 4,000 tokens, you need two merge calls, each combining 12–13 summaries of 300 tokens (3,600–3,900 tokens of input), producing one shorter intermediate summary. Then a final call merges those two intermediate summaries into the final result.

That is 25 map calls + 2 first-pass merges + 1 final merge = 28 API calls total, not the 26 a naive estimate might give. For larger documents — legal contracts, technical reports, long transcripts — the recursive merge tree can add 3–5 additional passes that are easy to miss in a manual estimate.

The planner computes the exact tree depth from your chunk size and summary length targets, which is the specific advantage over rough mental arithmetic.

Worked example

Suppose a 60,000-token document (roughly a 50-page report), a chunk size of 6,000 tokens, a per-chunk summary of 500 tokens, and a final summary target of 1,000 tokens:

Map stage: 10 chunks × (6,000 input + 500 output) = 65,000 tokens processed
Reduce stage: 10 summaries × 500 tokens = 5,000 tokens, fitting in one call (5,000 in + 1,000 out = 6,000 tokens processed)
Total tokens: 71,000 tokens — all billed at the input rate for input, output rate for output

At a model priced at $2 input / $8 output per million tokens, that is roughly $0.13 for the map stage inputs, $0.04 for map outputs, plus $0.01 for the reduce call — under $0.20 for the whole pipeline. Use the planner with your actual model and chunk parameters to see where the cost falls for your specific document.

Tips and notes

Bigger chunks, fewer calls. Larger chunks cut the number of map calls and merge passes, but leave room for the summary output within the window.
Split the model choice. Run the map stage on a cheap model and the final reduce on a stronger one — most of the tokens are in the map stage.
Pre-filter first. Dropping boilerplate or irrelevant sections before summarizing is the cheapest token you will ever save.
Estimates only. Real summaries vary in length; use realistic targets and confirm pricing before budgeting a large run.