How does prompt caching save money?

When a long prefix such as a system prompt or document is reused, the provider stores it and charges a much lower cache-read rate on repeat requests instead of the full input rate. Anthropic charges roughly 10 percent of input price on cache hits; OpenAI applies a similar discount.

What is the cache-write premium?

Writing a prefix into the cache costs slightly more than a normal input token, since the provider must store it. The first request pays this premium; subsequent hits within the cache lifetime pay the cheap read rate, so caching pays off when the same prefix is reused enough.

What cache-hit rate should I expect?

It depends on traffic shape. A chatbot with a fixed long system prompt and steady traffic can hit 80 to 95 percent; bursty or highly varied prompts hit far less. Try a few values to see how sensitive your savings are.

Is anything sent to a server?

No. The calculator runs entirely in your browser. You enter only counts and settings, and nothing is uploaded, stored, or logged.

What is the Prompt Caching Savings Calculator?

Model your workload's cache-hit rate to calculate real savings from Anthropic's prompt caching or OpenAI's cached prompts feature. Enter daily volume, system prompt size, and new tokens per request to see monthly savings. It runs free in your browser on Gera Tools, with nothing uploaded.

Prompt Caching Savings Calculator

Name: Prompt Caching Savings Calculator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Prompt caching savings calculator

If your requests share a long, stable prefix — a big system prompt, a document, a tool schema — prompt caching can slash your input bill. Providers store that prefix and charge a deeply discounted rate when it is reused. This calculator models your cache-hit rate to show what caching actually saves on your workload.

How it works

Each request has two parts: the reusable prefix (the cached system prompt or context) and the new tokens unique to that request. Without caching, every request pays the full input rate on both parts.

With caching, a fraction of requests — your cache-hit rate — pay only the cheap cache-read rate (about 10 percent of input price) on the prefix, while the remaining requests pay a small cache-write premium to populate the cache. New tokens are always billed at the normal input rate. The calculator nets these out and reports daily and monthly savings.

Worked example

Imagine you run a legal-document assistant. Your system prompt — rules, document schema, and jurisdiction notes — is 4,000 tokens. Each request adds about 200 new tokens. You make 5,000 requests per day. At a 90% cache-hit rate:

Without caching: 5,000 × 4,200 tokens × full input price each day.
With caching: 90% of 5,000 requests pay the cheap cache-read rate on the 4,000-token prefix; 10% pay the higher cache-write premium to refresh it; all 5,000 pay the normal input rate on just the 200 new tokens.

The difference compounds quickly at this volume. The calculator works through exactly this arithmetic for your specific numbers and price inputs.

What affects your savings most

Factor	Effect on savings
Prefix size	Larger prefix = more tokens cached = bigger saving per hit
Cache-hit rate	Direct multiplier — going from 60% to 90% hit rate roughly doubles savings
Daily volume	More calls means more opportunities to reuse the cached prefix
Provider rates	Cache-read discount depth varies — check current pricing

Tips and notes

Caching rewards big, stable prefixes. The longer and more reused the prefix, the larger the win. A tiny system prompt barely moves the needle.
Watch the cache lifetime. Cached prefixes expire after a short window (minutes by default), so steady traffic keeps the cache warm; sparse traffic lets it lapse and lowers your effective hit rate.
Combine with prompt compression. Shrinking the prefix before caching it saves on the write cost and slightly improves latency — the two techniques compound well.
Rates are editable estimates. Cache-read and cache-write multipliers differ by provider and change over time — confirm current pricing before budgeting.