How is the verbosity score calculated?

The score blends average sentence length, the share of low-information filler phrases, and the amount of near-duplicate content. A higher score means more tokens are being spent per unit of instruction, signalling room to trim.

What kinds of sentences get flagged?

Filler and politeness padding ("please note that", "it is important to"), near-duplicate constraints that repeat an earlier instruction, and very long run-on sentences that usually compress well.

How accurate is the token estimate?

It uses the common heuristic of roughly four characters per token, which is close for English prose. For an exact count use a model-specific tokenizer, but the relative savings shown here are reliable for comparing versions.

Is my prompt sent anywhere?

No. All analysis runs in your browser with no network requests, so confidential or proprietary system prompts stay on your device.

What is the System Prompt Length Optimizer?

Analyzes a system prompt for redundant instructions, repeated constraints, filler phrases, and low-information sentences, then flags removal candidates with an estimated token saving. Runs entirely in your browser — no key, no upload. It runs free in your browser on Gera Tools, with nothing uploaded.

System Prompt Length Optimizer

Name: System Prompt Length Optimizer
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

System prompt length optimizer

Every token in your system prompt is paid for on every single request, so a bloated prompt is a recurring tax. This tool reads a system prompt and flags the sentences most likely to be removable — filler phrases, repeated constraints, and run-on instructions — and estimates the tokens you would save by cutting them.

The cost of a long system prompt

Unlike user messages, which vary per request, the system prompt is identical on every call. A 400-token system prompt sent 10,000 times a day costs the same as 4 million input tokens daily regardless of how useful those tokens actually are. Filler phrases that a developer added “just to be safe” — phrases the model ignores — get billed just the same as a precise instruction the model actually follows.

The problem compounds as prompts are edited collaboratively over time. Rules get restated in slightly different words by different authors. “Do not discuss competitor products” and “never mention other companies’ software” end up in the same prompt, paying double for one constraint.

How it works

The prompt is split into sentences and each is scored on three signals:

Filler detection: A dictionary of low-information phrases marks common padding patterns — “please note that”, “it is important to”, “keep in mind that”, “feel free to”, and similar constructions. These phrases add words but carry no additional instruction.

Redundancy detection: Each sentence is reduced to a set of word bigrams and compared against all earlier sentences using the Jaccard similarity index. If a sentence shares most of its content with a prior sentence (same constraint, different wording), it is flagged as a duplicate.

Length scoring: Unusually long sentences are flagged as compression candidates. A sentence that says in 40 words what could be said in 12 is often a split-and-shorten opportunity.

The overall verbosity score blends these three signals with average sentence length. A rough four-characters-per-token heuristic translates the flagged sentences into an estimated token saving.

What to cut first

Filler and politeness padding always go. “Please make sure to always be helpful” adds zero behavioral signal over “be helpful.” The model does not respond to politeness in its instructions — it responds to clarity.

Duplicate constraints: keep one, delete the rest. Pick the most precise version and delete its restatements. The model does not follow a rule more reliably because it appears twice.

Prose to bullets: Long instruction paragraphs are often compressible. “When a user asks about pricing, you should first ask which product they are interested in, then explain the pricing tiers, and finally offer to connect them with a sales representative” becomes three bullets totalling a fraction of the token count.

Meta-instructions: Phrases like “the following instructions tell you how to respond” are wasted tokens. Just give the instructions.

Tips

Re-measure after each edit. Paste the revised prompt back to confirm the token count actually dropped before you ship it.
Test behavior, not just token count. A shorter prompt that changes model behavior has a different kind of cost. Run your evaluation set after trimming.
Track the saving by volume. The tool shows tokens saved per call; multiply by your daily call count to see the real monthly impact.