Why is long context so expensive?

Pricing is per token in the prompt, and a long-context model bills for the entire window on every single call. Sending a 500K-token knowledge base on each request means paying for 500K input tokens every time, even if the model only needs a few thousand of them to answer.

When is sending everything actually worth it?

When you ask only a handful of questions against a document, or when the task genuinely needs the whole context (cross-document reasoning, full-codebase analysis). For high-volume, narrow queries against a large corpus, retrieval that sends only relevant chunks is far cheaper.

Does prompt caching change the math?

Yes, dramatically. If the large context is fixed across calls, providers that cache it bill the repeated portion at a steep discount. This calculator shows the uncached worst case; if your provider supports caching, your real cost can be much lower.

Where does pricing come from?

It uses representative public per-million-token list prices for each model. Vendors change prices and tier long-context rates, so confirm against the provider's current pricing page before committing budget.

Long-Context Model Cost Calculator

Long-context model cost calculator

Million-token context windows make it tempting to just send everything every time. The trap is that you pay for every token in the window on every call. This calculator turns that into hard numbers — daily, monthly, and annual — and compares “send everything” against a retrieval approach that sends only what the model needs.

How it works

You enter your context size in tokens, calls per day, and a model with its per-million-token input price. The tool computes the cost of sending that full context on every call and extends it to monthly and annual figures. Enter the smaller token count a retrieval pipeline would send instead, and it shows the side-by-side cost and the savings percentage. Everything runs in your browser.

Tips and notes

The decision usually comes down to query volume. A few questions against a big document? Sending it whole is simplest. Thousands of narrow queries against a large corpus? Retrieval that sends a few thousand relevant tokens instead of hundreds of thousands can cut cost by 90 percent or more. If your large context is identical across calls, prompt caching is the third option and can beat both — this calculator shows the uncached ceiling, so treat caching as upside. Always confirm live long-context pricing, which vendors sometimes tier above a token threshold.