How are tokens estimated?

The tool uses a fast heuristic of roughly four characters per token, which closely tracks GPT and Claude tokenizers for English prose. It is an estimate for planning, not a billing-exact count — use a tokenizer for the final figure.

How does it decide which category a word belongs to?

It scans line by line for signals — imperative verbs and directives map to instruction, fenced or quoted blocks and few-shot markers map to examples, markdown and XML symbols map to formatting, hedging and filler phrases map to filler, and the rest is treated as context.

Which category should I cut first?

Filler almost always goes first — it is pure overhead. After that, look at examples and context: redundant few-shot examples and pasted boilerplate are usually the largest compressible blocks in a real prompt.

Is my prompt sent anywhere?

No. The analyzer runs entirely in your browser. Your prompt is never uploaded, stored or logged.

What is the Token Waste Analyzer?

Free token waste analyzer. Paste a prompt and it categorizes every token into instruction, context, examples, formatting and filler, estimating token counts per category and flagging which ones have the most compression potential to cut your API bill. It runs free in your browser on Gera Tools, with nothing uploaded.

Token Waste Analyzer

Name: Token Waste Analyzer
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Token waste analyzer

Long prompts are billed on every single call, so a bloated system prompt quietly multiplies your bill at volume. This analyzer breaks your prompt into five categories — instruction, context, examples, formatting and filler — estimates the token share of each, and tells you which blocks have the most room to shrink.

How it works

The tool reads your prompt line by line and assigns tokens to a category using lexical signals:

Instruction — imperative verbs and directives (“you must”, “respond with”, “do not”).
Examples — fenced code blocks, quoted samples and few-shot markers (“Input:”, “Output:”, “Example”).
Formatting — markdown headers, bullets, tables and XML tags.
Filler — hedging and padding phrases (“please”, “as an AI”, “kindly”, “in order to”, “it is important to note”).
Context — everything else: the background, data and reference material.

Token counts use the standard ≈ 4-characters-per-token heuristic, which tracks real tokenizers closely enough for prioritization.

Category-by-category compression guide

Filler (easiest wins): Filler tokens are almost always pure overhead. Models do not respond more helpfully to politeness — “please”, “kindly”, and “it is important to note that” add tokens on every call without moving the output needle. Delete them first; they are the fastest win.

Examples (biggest potential savings): Few-shot examples are often the largest single block in a system prompt, and they tend to accumulate over time as engineers add “just one more” to cover edge cases. The rule of thumb: two to three sharp, diverse examples beat six near-duplicates. Identify which examples actually improve output quality by running an ablation (remove one at a time and test) rather than assuming all examples contribute equally.

Context (compressible but requires care): Pasted background documents, product descriptions, and reference material often contain redundant sentences. Summarize static context into a denser form rather than pasting raw content. If the context never changes between calls, it is an ideal candidate for prompt caching, which bills repeated cache-hit tokens at a heavily discounted rate.

Formatting (often over-engineered): Nested XML tags, multi-level markdown headers, and decorative dividers are a habit borrowed from human-readable documentation. Models parse structure effectively from simpler signals. A flat numbered list often works as well as a deeply nested header hierarchy at a fraction of the token cost.

Instruction (usually cannot shrink much): The instruction section tends to be lean by nature, since it is the part engineers write most deliberately. Check it last. If instruction tokens are large, look for redundancy — the same constraint restated in three different ways.

Interpreting the category breakdown

If any single category accounts for more than 40 percent of your total tokens, investigate it first. A prompt where 50 percent of tokens are categorized as filler or formatting has obvious low-hanging fruit. A prompt where context dominates (over 60 percent) is often a sign that static reference material should be moved to caching or summarized offline.

Tips to recover wasted tokens

Delete filler outright. “In order to” → “to”; drop “please” and “kindly”. It changes nothing about model behavior.
Cap your examples. Two or three sharp few-shot examples usually beat ten near-duplicates that cost tokens on every call.
Move stable context to caching. If a large block never changes, prompt caching bills it at a fraction of the input rate.
Flatten formatting. Decorative markdown and nested tags add tokens without improving answers — keep only the structure the model actually needs.
Re-run after changes. The category breakdown is fast enough to use as a quick audit after each round of editing — paste, check, trim, repeat.