Why does re-sending context cost money?

Unless you use prompt caching, every request that includes a large context block pays for those input tokens again. A 20,000-token context at high volume can dominate your bill even though the content rarely changes.

How does refresh interval affect cost?

A shorter refresh interval means you rebuild and re-send fresh context more often, raising cost. A longer interval is cheaper but risks serving stale information. This tool quantifies that tradeoff.

Does prompt caching change the math?

Yes. Providers that support caching charge a reduced rate for cached input tokens after the first call within the cache window. Enabling the caching option here applies a discount to repeated context reads.

Is my data sent anywhere?

No. The calculator runs entirely in your browser. Nothing you enter is uploaded, stored or logged.

What is the Context Freshness vs Cost Calculator?

For apps that re-inject long-lived context such as latest docs or knowledge bases, this calculator models the monthly cost of different refresh intervals against staleness risk so you can pick the most cost-effective update cadence. It runs free in your browser on Gera Tools, with nothing uploaded.

Context Freshness vs Cost Calculator

Name: Context Freshness vs Cost Calculator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Balance context freshness against token cost

Retrieval and agent apps often carry a large, slowly-changing context block — documentation, a knowledge base, a long system prompt — that is re-sent on every request. The fresher you keep it, the more often you rebuild and resend it, and the more input tokens you pay for. This calculator models the cost of each refresh interval alongside the average staleness it implies, so you can choose a cadence that is cheap enough and fresh enough.

How it works

Every request that includes the context pays for its input tokens:

daily_context_cost = context_tokens × daily_requests / 1,000,000 × input_price

Refreshing more frequently does not change the per-request cost directly, but a shorter interval means more cache misses (full-price reads) when prompt caching is on, while a longer interval increases the average staleness — roughly half the refresh interval, since content can be up to one full interval old. The tool sweeps common intervals and reports cost and staleness side by side so the tradeoff is explicit.

Where the tradeoff actually bites

The cost model is straightforward: a larger context block costs more per request, and higher daily request volume multiplies that. But the staleness side is subtler. Average staleness is roughly half the refresh interval because content updated immediately after a refresh can be stale for the full interval, while content updated just before the next refresh is almost current — the average across all update moments is approximately half. For content that changes rarely and predictably (monthly product catalogues, weekly policy docs), a long interval produces acceptable staleness at much lower cost. For content that changes continuously (live pricing, real-time status), a long interval is unacceptable regardless of cost.

A concrete sizing example

Suppose you inject a 10,000-token documentation block into every request, serving 5,000 requests per day at an input price of $1.50 per million tokens, with prompt caching enabled.

Without caching, any refresh interval: 10,000 × 5,000 = 50M tokens/day; 50M / 1,000,000 × $1.50 = $75/day.
With caching (90% discount on cache reads after the first call each window): the first call per cache window pays full price; subsequent reads pay roughly 10% of that. If the cache window is 5 minutes and your traffic is spread evenly, most requests hit the cache — effective cost can drop by 80%+.

The staleness question then becomes: how often does your 10,000-token doc actually change? If daily, a 24-hour refresh interval means stale content for up to 24 hours — which may be fine for historical docs but unacceptable for pricing or availability data.

Tips for cost-effective freshness

Turn on prompt caching. If your context is stable between refreshes, cached input tokens are billed at a fraction of the normal rate — often the single biggest lever here.
Refresh on change, not on a timer. If you can detect document updates, an event-driven refresh beats a fixed interval for both cost and freshness.
Split hot from cold context. Keep volatile facts small and refresh them often; keep the large stable corpus on a long interval.
Trim the context. The cheapest token is the one you never send — retrieve only the chunks a query actually needs instead of stuffing the whole corpus.