Why does each turn cost more in a long chat?

Most chat APIs resend the entire conversation history as input every turn. As history grows, input tokens — and therefore cost — grow with it, even if your new message is short. Cost per turn rises roughly linearly with conversation length.

What happens when I hit the context limit?

The model can no longer fit the full history. You must truncate, summarize, or use a sliding window — otherwise the oldest messages get dropped or the request is rejected. The timeline shows exactly which turn that happens on.

How is cumulative context computed?

Each turn adds your per-message and per-reply tokens to a running total. The tool plots that total against the window limit and marks the threshold crossings.

Is anything sent to a server?

No. The visualization is computed entirely in your browser. Nothing you enter is uploaded or stored.

What is the Context Window Timeline Viewer?

See a turn-by-turn bar chart of context consumption in a long chat, showing when you cross 50%, 75% and 100% of the model's window and where token costs start to spike. It runs free in your browser on Gera Tools, with nothing uploaded.

Context Window Timeline Viewer

Name: Context Window Timeline Viewer
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Context window timeline viewer

In a long multi-turn chat, context accumulates: most APIs resend the whole history every turn. This viewer plots a bar per turn so you can see exactly when the conversation crosses 50%, 75% and 100% of the model’s context window — and where the per-turn cost starts climbing because you are resending an ever-larger history.

Who needs this

Any developer or builder running multi-turn chat in production: chatbots, coding assistants, customer-support agents, document Q&A loops, and agentic task runners where the model calls tools over many rounds. Without visualising context growth it is easy to hit the limit mid-conversation and have the API reject the request, or to accumulate surprise costs because early turns now weigh heavily on every subsequent call.

How it works

Each turn adds the tokens from your new message plus the assistant’s reply to a running total. The tool charts that cumulative total against the window limit:

context_after_turn(n) = Σ (user_tokens + reply_tokens) for turns 1..n
input_cost_per_turn(n) ≈ context_before_turn(n) × input_price

Because the full history is re-sent as input each turn, input cost grows linearly with conversation length — the tenth turn can cost several times the first even if your messages stay the same size. The timeline marks the turn where you first cross each threshold so you know when to act.

Worked example

Suppose a model has a 128,000-token context window. A user sends roughly 150 tokens per message and the model replies with roughly 300 tokens per turn — so each turn adds 450 tokens.

Turn	Cumulative tokens	% of window
1	450	0.35%
50	22,500	17.6%
100	45,000	35.2%
200	90,000	70.3%
284	127,800	~100%

The cost of turn 284 is over 100 times the cost of turn 1 for input tokens alone — even though the user’s message is the same length both times. That cost spike, and the impending context overflow, is exactly what the timeline makes visible.

Strategies once you see the curve

Summarize at 75%. When the bar chart crosses the 75% marker, roll old turns into a compact system-level summary and restart history from there. Your next turn’s input cost drops sharply.
Sliding window. Keep only the last N turns plus a running summary for predictable per-turn cost ceilings.
Retrieval augmentation. For document Q&A or code search, store the material outside the context and inject only the relevant retrieved chunks per turn, so the window never fills with static content.
Bigger window ≠ cheaper. A model with a 200K window still charges per input token. The window just postpones the overflow — the cost curve is the same shape.