Context window timeline viewer
In a long multi-turn chat, context accumulates: most APIs resend the whole history every turn. This viewer plots a bar per turn so you can see exactly when the conversation crosses 50%, 75% and 100% of the model’s context window — and where the per-turn cost starts climbing because you are resending an ever-larger history.
How it works
Each turn adds the tokens from your new message plus the assistant’s reply to a running total. The tool charts that cumulative total against the window limit:
context_after_turn(n) = Σ (user_tokens + reply_tokens) for turns 1..n
input_cost_per_turn(n) ≈ context_before_turn(n) × input_price
Because the full history is re-sent as input each turn, input cost grows linearly with conversation length — the tenth turn can cost several times the first even if your messages stay the same size. The timeline marks the turn where you first cross each threshold so you know when to act.
Tips
- When you approach 75%, start summarizing older turns into a compact recap and dropping the raw history — this resets cumulative growth.
- Use a sliding window that keeps only the last N turns plus a running summary for predictable cost.
- For retrieval/agent loops, store long material outside the context and pull only what each turn needs.
- Larger context windows raise the ceiling but not the cost curve — a long chat in a big window is still expensive per turn.