Long-context model cost calculator
Million-token context windows make it tempting to just send everything every time. The trap is that you pay for every token in the window on every call. This calculator turns that into hard numbers — daily, monthly, and annual — and compares “send everything” against a retrieval approach that sends only what the model needs.
How it works
You enter your context size in tokens, calls per day, and a model with its per-million-token input price. The tool computes the cost of sending that full context on every call and extends it to monthly and annual figures. Enter the smaller token count a retrieval pipeline would send instead, and it shows the side-by-side cost and the savings percentage. Everything runs in your browser.
Tips and notes
The decision usually comes down to query volume. A few questions against a big document? Sending it whole is simplest. Thousands of narrow queries against a large corpus? Retrieval that sends a few thousand relevant tokens instead of hundreds of thousands can cut cost by 90 percent or more. If your large context is identical across calls, prompt caching is the third option and can beat both — this calculator shows the uncached ceiling, so treat caching as upside. Always confirm live long-context pricing, which vendors sometimes tier above a token threshold.