Why does a long chat cost more per turn over time?

Each API call resends the entire conversation so far as input. By turn ten you are paying to reprocess turns one through nine again, so the input cost of each new turn grows roughly linearly with conversation length.

Is the system prompt billed every turn?

Yes. The system prompt is part of the input on every request, so a long system prompt multiplies across every turn. The tracker adds it to each turn's input count.

How are tokens estimated?

Token counts use a character and word heuristic of roughly four characters per token for English, plus a small per-message overhead for role and structure tokens. It is an estimate, not the exact tokenizer.

How can I keep a long conversation cheap?

Summarize or truncate old turns, use prompt caching for the stable prefix, or move static context into a system prompt that the provider can cache. The tracker shows exactly where the cost is accumulating.

Is anything uploaded?

No. The whole simulation runs in your browser. Nothing you type is sent anywhere.

What is the Conversation Cost Tracker?

Simulates a multi-turn chat and shows how costs compound as the conversation grows, with a real-time cost meter updating after each user and assistant turn. Fully client-side. It runs free in your browser on Gera Tools, with nothing uploaded.

Conversation Cost Tracker

Name: Conversation Cost Tracker
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Conversation cost tracker

The expensive surprise in chat applications is not any single message — it is the way an LLM conversation resends its entire history on every turn. Early turns are cheap; by the twentieth turn each new reply re-bills all nineteen prior turns as input. This tracker simulates that growth so you can see the running total climb and decide when to summarize or truncate.

How it works

A chat API is stateless: to keep context, your app resends every prior message on each request. So turn N’s input cost equals the system prompt plus all turns 1..N-1 plus the new user turn, while the output cost covers only the new assistant reply. The tracker counts tokens for each turn (with a small per-message overhead for role tokens) and accumulates the input and output costs at your chosen model’s rates, updating the meter after every turn you add.

Why costs grow faster than you expect

The total input token count for a conversation with N turns is roughly triangular:

Turn 1 input: system + user_1
Turn 2 input: system + user_1 + assistant_1 + user_2
Turn 3 input: system + user_1 + assistant_1 + user_2 + assistant_2 + user_3
...

By turn 10 you are paying to re-read the first nine turns on every single request. Output tokens (only the new assistant reply each time) stay linear, but input tokens grow roughly as O(n²) with conversation length. That quadratic growth is what the tracker makes visible — most teams only notice it when the invoice arrives at the end of the month.

Worked example

Suppose each user turn averages 50 tokens and each assistant reply averages 200 tokens, with a 500-token system prompt, at a hypothetical $1.00 per million input tokens and $3.00 per million output tokens.

Turn	Input tokens	Output tokens	Turn cost	Cumulative
1	550	200	~$0.0006	~$0.0006
5	1,750	200	~$0.0023	~$0.007
10	3,500	200	~$0.0042	~$0.020
20	8,000	200	~$0.0086	~$0.070

By turn 20 each new message costs fourteen times what the first turn cost, even though the user wrote roughly the same amount. The cumulative bill at turn 20 is about 120× the cost of turn 1 alone. Real numbers vary by model pricing, but the shape of the curve is universal — the tracker lets you see exactly where yours bends.

Strategies to control multi-turn cost

Summarize aggressively. When the conversation history passes a token threshold, replace the oldest turns with a compact summary. Users rarely need the verbatim transcript; they need the context.
Use provider prompt caching. The system prompt and any stable prefix (tool descriptions, persona, context documents) can often be cached so you pay a fraction of the input rate for those tokens on subsequent turns.
Truncate the window. For many support or task-completion bots, only the last N turns matter. Drop older turns once they are no longer needed.
Watch output length. Output tokens compound too — a verbose assistant reply that is twice as long as it needs to be shows up in every subsequent turn’s input.

Tips and notes

The system prompt is paid on every turn — keep it tight, and prefer provider-side prompt caching for any long, stable prefix.
Once a conversation gets long, summarizing the oldest turns into a short recap can cut input cost dramatically without losing the thread.
Output tokens are billed only once each (the new reply), but input tokens compound — that asymmetry is why long chats drift toward being input-dominated.