Why do I need a memory block at all?

Most LLM API calls are stateless — the model only knows what is in the current request. To maintain continuity across turns you must re-send relevant prior context yourself. This tool formats that context cleanly and keeps it inside a token budget.

How does recency weighting work?

When the facts exceed the budget, the tool must drop some. With recency weighting on, it keeps the most recent facts and trims the oldest, which suits ongoing conversations. With it off, it keeps facts in listed order until the budget is reached.

How is the budget enforced?

Each fact's token cost is estimated at roughly four characters per token, and facts are added until the budget is reached. The estimate is approximate, so leave some headroom for your actual prompt and the model's reply.

Is anything sent to a server?

No. The memory block is assembled entirely in your browser. Nothing you type is uploaded or stored.

Contextual Memory Prompt Builder

Contextual memory prompt builder

LLM API calls are stateless: unless you re-send it, the model forgets everything from previous turns. The naive fix — pasting the whole history — quickly overflows the context window and wastes tokens. This builder turns your list of prior facts and events into a compact, clearly delimited memory block, trimmed to a token budget you set, ready to prepend to your next request.

How it works

You list the facts the model should remember, oldest first. You set a token budget for the memory block, and choose whether trimming should favour recency. The tool estimates each fact’s token cost (about four characters per token) and fills the block up to the budget — keeping the most recent facts when recency weighting is on, or list order when it is off. The result is wrapped in a labelled section with an instruction telling the model to treat it as established context.

Everything is computed in your browser; there is no API call and nothing is stored. As you adjust the budget or the list, the block and its token estimate update live.

Tips and notes

Write each fact as a single, self-contained statement — “User prefers metric units,” “Project deadline is March 14” — rather than transcript snippets; atomic facts compress better and survive trimming gracefully. Keep the budget well under your model’s context limit so there is room for the actual user message and the reply. For long-running chats, summarise older turns into a few durable facts rather than carrying raw history, and turn recency weighting on so recent developments always make the cut. Re-generate the block each turn from your maintained fact list, and the conversation will feel continuous even though every call is independent.