How does the compression work?

The tool reserves part of your token budget for the most recent turns, which it keeps verbatim, and sends the older turns to the model for a dense factual summary. This rolling-summary pattern preserves recency while collapsing stale history into a fraction of the tokens.

Why keep recent turns verbatim?

The latest messages usually carry the immediate intent and pronoun references the model needs to respond correctly. Summarizing them too aggressively loses nuance, so only the older history is compressed while recent context stays intact.

Are the token counts exact?

No. Counts use the common ~4-characters-per-token heuristic, which is close for English prose but not identical to a real tokenizer. Use them as a guide and leave headroom in your actual context window.

Yes. Your key is sent only in the direct request from your browser to OpenAI or Anthropic and is never sent to any Gera server or stored. You can revoke it anytime from your provider dashboard.

What is the Conversation History Compressor (BYO-key)?

Summarize older turns of a conversation with your own OpenAI or Anthropic API key to build a rolling summary, replacing verbose history with a compact context block that fits your token budget. It runs free in your browser on Gera Tools, with nothing uploaded.

Conversation History Compressor (BYO-key)

Name: Conversation History Compressor (BYO-key)
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Keep long conversations inside the context window

Long-running chats eventually overflow the model’s context window — or waste tokens replaying stale history that no longer matters. This tool builds a rolling summary: it keeps your most recent turns verbatim and asks the model to compress the older portion into a dense factual block, so the whole conversation fits your token budget without losing the facts, names, decisions, and open questions that matter.

How it works

You paste your conversation as a JSON array of { "role", "content" } objects and set a target token budget. The tool divides that budget into two zones:

Recency window (~40%) — the newest turns are passed through unchanged, because recent context carries pronoun references and immediate intent that should not be paraphrased away.
Compressed history (~60% of budget ceiling) — all older turns are sent to your chosen model (OpenAI or Anthropic) with a prompt that asks for a compact summary preserving every named fact, decision, commitment, and unresolved question.

The output is a single context block — the summary first, then the verbatim recent turns — ready to paste into the next call as the new conversation history. Token estimates use the ~4 characters-per-token heuristic for speed.

When to use this

This pattern is most useful when:

A multi-day support or research chat has grown past 10,000 tokens and starts hitting context limits.
You want to archive a conversation but carry its key facts into a fresh session.
You are building a persistent AI agent that must carry state across many API calls without re-sending the full transcript every time.

Practical tips

Leave headroom. Set the budget a little below your actual context window limit so there is room for the next user message and the model’s reply. A good rule of thumb is 80–85% of the window.

Check for dropped facts. If the summary misses a detail you need, either raise the budget so the recency window is wider, or move that specific turn out of the compressible portion before running.

Iterative compression. For very long histories, compress in stages: run compression once, treat the output as the new history, then compress again if needed.

API key safety. Your key is sent only in direct browser-to-provider requests and is never stored or routed through any server. You can revoke it at any time from your provider’s dashboard.