What is the difference between single-pass and map-reduce?

Single-pass sends the whole document in one prompt and gets one summary — simplest and cheapest per document, but limited by the model's context window. Map-reduce splits the document into chunks, summarizes each, then summarizes the summaries, which handles huge documents at the cost of extra tokens and calls.

When should I use map-reduce?

Use map-reduce when documents exceed the model's context window or when single-pass quality degrades on very long inputs. For short documents that fit comfortably in context, single-pass is cheaper and simpler.

How are words converted to tokens?

As a rule of thumb, one English word is about 1.3 tokens. The calculator uses this factor for both document input and summary output, so a 1,000-word document is roughly 1,300 input tokens.

Can I cut the cost with batch APIs?

Yes. Most providers offer a batch or asynchronous tier at roughly 50% off for non-urgent jobs like overnight bulk summarization. Apply that discount to the totals here if your job is not latency-sensitive.

Document Summary Cost Calculator

Budget a bulk summarization job before you run it

Summarizing one document is cheap. Summarizing 10,000 of them is a line item. This calculator turns a document count, average length, and summary length into total token volume, then prices it for single-pass and map-reduce strategies so you can pick the cheapest approach that still fits your model’s context window.

How the two strategies are priced

Single-pass sends each whole document as input and returns one summary:

per_doc = (doc_tokens/1e6 × in_price) + (summary_tokens/1e6 × out_price)
total   = per_doc × document_count

Map-reduce splits each document into chunks, summarizes every chunk (the “map”), then summarizes those chunk-summaries into a final summary (the “reduce”). It re-reads the document once as map input and adds a reduce step, so it costs more tokens but works on documents far larger than the context window.

Tips to cut the bill

For non-urgent jobs, use a provider batch tier (around 50% off). Prefer a cheap model (GPT-4o mini, Claude Haiku, Gemini Flash) for the map step and a stronger model only for the final reduce. And keep summaries short — output tokens are the most expensive part of every call.