Persistent AI assistants do not get memory for free. To stay coherent across days or weeks they periodically summarize and re-inject context — and that maintenance has an ongoing token cost that scales with your number of active sessions. This calculator makes that recurring spend visible.
How it works
Each refresh cycle has two cost components:
- Generation cost — output tokens spent producing the summary of recent conversation.
- Injection cost — input tokens spent re-inserting that summary into the prompt so the assistant carries it forward.
The tool computes:
cost per refresh = (gen_tokens × output_price + inject_tokens × input_price) / 1e6
monthly fleet cost = cost_per_refresh × refreshes_per_month × active_sessions
Refreshes per month derive from your chosen frequency. The result is broken into per-refresh, per-session-per-month, and whole-fleet totals.
Worked example
A support assistant that refreshes every 6 hours of active use, ~4 refreshes/day, 2,000 re-injection tokens and 600 generation tokens per refresh, at $1/1M input and $5/1M output, across 5,000 active sessions:
- Cost per refresh: (600 × $5 + 2,000 × $1) / 1e6 = $0.005
- Per session/month: $0.005 × 120 ≈ $0.60
- Fleet/month: $0.60 × 5,000 = $3,000
Context maintenance alone is a $36k/year line item — worth optimizing.
Tips
- Prompt caching is the biggest lever: model the cached input rate to see re-injection cost drop sharply.
- Summarize harder. Halving the re-injected size halves the dominant input cost.
- Tier dormant sessions so you only refresh ones in active use.
- Pair with the LLM API Cost Calculator to fold this into total spend.