Context caching strategy planner
Prompt caching can cut input costs by 50–90% on repetitive workloads, but only if your prompt is structured so the cache can match a large, unchanging prefix. This planner models your prompt as static context, dynamic context, and a per-request user message, then compares the cost of no caching against caching just the static block and against a fully front-loaded layout — so you can see exactly how much prompt order is worth.
How it works
You enter token counts for the static portion (system prompt, reference docs), the dynamic portion (retrieved chunks that change per request), and the user message, plus daily volume and your provider. The tool applies the provider’s cache-read discount — roughly a 90% reduction for Anthropic cache reads, about 50% for OpenAI cached input — to whatever sits in the cacheable prefix. It checks the minimum cacheable size, computes cost under each layout, and reports the monthly savings and recommended structure.
Tips and notes
- Front-load everything stable. System prompt, tools, and reference material go first; volatile data goes last.
- Never put a timestamp up top. A single changing token near the start invalidates the whole prefix.
- Mind the minimum. Below ~1,024 prefix tokens, caching does not engage — the tool warns you.
- Caching expires fast. It only helps when the same prefix recurs within the cache window, so it favors steady high-volume traffic.