Context Window Doubling Cost Calculator

See how changing your context size impacts API cost — it is not a clean 2x.

Ad placeholder (leaderboard)

Doubling the context does not double the bill

A common budgeting mistake is assuming that sending twice as much context costs twice as much. It almost never does. LLMs bill input and output tokens separately, and only the input portion grows when you enlarge the context — your output stays the same length. This calculator shows the real cost change between your current and proposed context sizes for any model and volume.

How it works

Each request costs (context_tokens × input_price) + (output_tokens × output_price). When you raise the context from, say, 4,000 to 8,000 tokens, only the first term doubles; the output term is unchanged. So if output is a meaningful slice of each call, your cost ratio comes in below the context ratio. The tool reports both ratios explicitly, along with the per-call and monthly cost before and after, so you can see exactly where the money goes.

Tips for sizing context economically

  • Watch your output share. The bigger your output relative to input, the cheaper extra context is in percentage terms.
  • Cache the stable part. If most of your context is a fixed system prompt or document, prompt caching can cut the input cost of the repeated portion sharply.
  • Trim before you grow. Remove redundant or low-value context first; it is cheaper than paying for a larger window full of filler.
  • Balance cost and quality. More context is not always better — see the context-size-vs-retrieval-quality tool for the diminishing-returns curve.
Ad placeholder (rectangle)