System prompt size vs model quality tradeoff
Your system prompt is sent on every request, so each token in it is billed again and again and permanently eats into your context budget. Yet system prompts tend to accumulate hedging, politeness, and redundant restatements that add cost without changing behavior. This tool trims that fat at a level you control, shows the token and cost savings, and flags the quality risk so you can find the smallest prompt that still works.
How it works
You paste your system prompt and choose an aggressiveness level. The tool applies rule-based reductions locally: collapsing whitespace, removing filler and hedging phrases, cutting politeness, and compressing verbose constructions, with more patterns active at higher aggressiveness. It estimates tokens before and after, prices the per-request saving at a representative rate, projects monthly savings against your volume, and attaches a quality-risk note for the chosen level. Nothing leaves your browser.
Tips and notes
- Trim, then evaluate. Run your real test cases on the lean prompt before shipping it — savings are worthless if behavior regresses.
- Aggressive is for experiments. Start conservative; only push the slider once you have a way to measure quality.
- It compounds. A small per-request saving times high volume per month is the whole point — check the monthly figure.
- Keep the constraints. The tool targets filler, not rules; re-read the output to confirm no hard requirement was lost.