Why doesn't doubling context double my cost?

Because only the input (context) portion of a request scales with context size. Output tokens are billed separately and stay fixed, so if output is a meaningful share of each call, doubling the context grows the cost by less than 2x.

When would doubling context nearly double cost?

When your output is tiny relative to your input — for example a long RAG context with a one-word classification answer. There the bill is almost all input, so the cost ratio approaches the context ratio.

Does a bigger context window itself cost more?

A model's maximum context capacity (e.g. 128K vs 1M) is a capability, not a charge — you only pay for the tokens you actually send. Some providers do price very long contexts at a higher per-token tier, which you can reflect by editing the input price.

Should I always use less context to save money?

Not necessarily. More context can improve accuracy and cut expensive retries. Use this tool alongside the retrieval-quality tradeoff tool to balance cost against output quality.

Is my data sent anywhere?

No. The calculation runs entirely in your browser. Nothing you enter is uploaded, stored or logged.

What is the Context Window Doubling Cost Calculator?

Free context window cost calculator. Compare your current and new input context sizes for any LLM and see the exact monthly cost impact — including why doubling context does not double your bill (output tokens stay fixed). Runs in your browser. It runs free in your browser on Gera Tools, with nothing uploaded.

Context Window Doubling Cost Calculator

Name: Context Window Doubling Cost Calculator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Doubling the context does not double the bill

A common budgeting mistake is assuming that sending twice as much context costs twice as much. It almost never does. LLMs bill input and output tokens separately, and only the input portion grows when you enlarge the context — your output stays the same length. This calculator shows the real cost change between your current and proposed context sizes for any model and volume.

Why the cost ratio is always less than the context ratio

The intuition that “2x context = 2x cost” misses a key structural fact: output tokens are a fixed cost per call that does not change when you send more input. Every LLM call costs:

call_cost = (input_tokens × input_price) + (output_tokens × output_price)

When you double the context, only input_tokens doubles. The output_tokens × output_price term stays constant. So your cost ratio is:

cost_ratio = (2 × input + output) / (input + output)
           = always less than 2, by the weight of the output term

The more expensive your output relative to your input (whether by volume or by price), the further below 2x the real cost increase falls.

Worked examples

Example 1 — Long context, short output:

Current: 8,000 input tokens, 200 output tokens
New: 16,000 input tokens (doubled), 200 output tokens (unchanged)
At input price $3/M and output price $15/M:
- Current call cost: (8,000/1M × $3) + (200/1M × $15) = $0.024 + $0.003 = $0.027
- New call cost: (16,000/1M × $3) + (200/1M × $15) = $0.048 + $0.003 = $0.051
- Cost ratio: $0.051 / $0.027 ≈ 1.89× (not 2×) despite doubling context

Example 2 — Context increase but large output:

Current: 4,000 input tokens, 2,000 output tokens
New: 8,000 input tokens (doubled), 2,000 output tokens (unchanged)
At same prices:
- Current: $0.012 + $0.030 = $0.042
- New: $0.024 + $0.030 = $0.054
- Cost ratio: $0.054 / $0.042 ≈ 1.29× — far below 2×, because output is a large share

Example 3 — Almost all input, tiny output:

Current: 50,000 input tokens, 50 output tokens
New: 100,000 input (doubled), 50 output tokens
At same prices: ratio ≈ 1.999× — nearly 2× because output is negligible

When to use this calculator

Before increasing RAG chunk counts or retrieval depth, to forecast the cost impact
When evaluating whether a larger context window model is worth the switch
When planning a feature that will add N tokens to every request (e.g., adding a system-level document)
When comparing “more context” versus “more retrieval passes” as architectural choices

Tips for sizing context economically

Watch your output share. The bigger your output relative to input, the cheaper extra context is in percentage terms — and vice versa.
Cache the stable part. If a large portion of your context is a fixed system prompt or reference document, prompt caching can cut the input cost of that repeated portion significantly. Some providers offer cached-token pricing at a steep discount.
Trim before you grow. Remove redundant or low-value context first; it is always cheaper than paying for a larger window full of filler.
Balance cost and quality. More context is not always better — see the context-window-vs-retrieval-quality tool for the diminishing-returns curve.