What does your hidden system prompt really cost?
Most production LLM apps prepend a large system prompt the user never sees — instructions, tone, guardrails, tool definitions. Because it is re-sent as input on every call, its cost scales with your traffic and quietly becomes one of your biggest line items. This tool isolates that cost.
How it works
The tool estimates the token size of your system prompt, then multiplies by your monthly call volume and your model’s input price:
monthly_cost = system_prompt_tokens
× calls_per_month
/ 1,000,000
× input_price_per_million
It also shows the cached figure — providers discount repeated, unchanged prefix content heavily, and a stable system prompt is the ideal candidate.
Tips to cut the cost
- Trim ruthlessly. Every token here is paid on every call. Remove filler, redundant examples, and decorative formatting.
- Enable prompt caching. A static system prompt is the textbook caching win — often a large discount on the prefix portion.
- Move rarely-needed instructions out. Conditional rules that apply to 5% of requests can be injected only when relevant rather than living in the always-on prompt.