Keep long conversations inside the context window
Long-running chats eventually overflow the model’s context window or simply waste tokens on stale history. This tool builds a rolling summary: it keeps your most recent turns verbatim and asks the model to compress the older portion into a dense factual block, so the conversation still fits your token budget without losing the facts, names, and decisions that matter.
How it works
You paste your conversation as a JSON array of { "role", "content" } objects and
set a target token budget. The tool reserves roughly 40% of that budget for the
newest turns, which it passes through unchanged, then sends everything older to
your chosen model with a prompt that asks for a compact summary preserving every
fact, decision, name, and open question. The output is a single context block:
the summary plus the verbatim recent turns, ready to drop back into your next
request. Token estimates use the standard ~4-characters-per-token heuristic.
Tips and notes
Set the budget a little below your real limit to leave room for the new user message and the model’s reply. If the summary drops a detail you needed, raise the budget or move that turn into the recent window. The summary prompt is tuned to retain unresolved questions, which is where naive compression usually fails. Your key never leaves your browser except to call the provider directly, and it is never stored.