Conversation cost tracker
The expensive surprise in chat applications is not any single message — it is the way an LLM conversation resends its entire history on every turn. Early turns are cheap; by the twentieth turn each new reply re-bills all nineteen prior turns as input. This tracker simulates that growth so you can see the running total climb and decide when to summarize or truncate.
How it works
A chat API is stateless: to keep context, your app resends every prior message on each request. So turn N’s input cost equals the system prompt plus all turns 1..N-1 plus the new user turn, while the output cost covers only the new assistant reply. The tracker counts tokens for each turn (with a small per-message overhead for role tokens) and accumulates the input and output costs at your chosen model’s rates, updating the meter after every turn you add.
Tips and notes
- The system prompt is paid on every turn — keep it tight, and prefer provider-side prompt caching for any long, stable prefix.
- Once a conversation gets long, summarizing the oldest turns into a short recap can cut input cost dramatically without losing the thread.
- Output tokens are billed only once each (the new reply), but input tokens compound — that asymmetry is why long chats drift toward being input-dominated.