Prompt Caching Savings Calculator

See how much you save with Anthropic or OpenAI prompt caching

Ad placeholder (leaderboard)

Prompt caching savings calculator

If your requests share a long, stable prefix — a big system prompt, a document, a tool schema — prompt caching can slash your input bill. Providers store that prefix and charge a deeply discounted rate when it is reused. This calculator models your cache-hit rate to show what caching actually saves on your workload.

How it works

Each request has two parts: the reusable prefix (the cached system prompt or context) and the new tokens unique to that request. Without caching, every request pays the full input rate on both parts.

With caching, a fraction of requests — your cache-hit rate — pay only the cheap cache-read rate (about 10 percent of input price) on the prefix, while the remaining requests pay a small cache-write premium to populate the cache. New tokens are always billed at the normal input rate. The calculator nets these out and reports daily and monthly savings.

Tips and notes

  • Caching rewards big, stable prefixes. The longer and more reused the prefix, the larger the win. A tiny system prompt barely moves the needle.
  • Watch the cache lifetime. Cached prefixes expire after a short window (minutes by default), so steady traffic keeps the cache warm; sparse traffic lets it lapse and lowers your effective hit rate.
  • Rates are editable estimates. Cache-read and cache-write multipliers differ by provider and change over time — confirm current pricing before budgeting.
Ad placeholder (rectangle)