How is the complexity score calculated?

The tool runs rule-based heuristics over your prompt text — counting reasoning cue words, multi-step indicators, conditional logic, format demands, and knowledge breadth — and blends those signals into a 1-10 score. It is deterministic and runs entirely in your browser.

Does a high score mean my prompt is bad?

No. A high score just means the task is cognitively demanding, which is fine if it is genuinely hard. It is a signal to pick a capable model and consider step-by-step prompting, not a quality judgment.

Why does this matter for model selection?

Simple, low-complexity prompts run reliably on small, cheap models. High-complexity prompts with many reasoning steps benefit from frontier models. Grading first helps you avoid overpaying or under-provisioning.

Does it send my prompt anywhere?

No. All scoring happens locally in JavaScript in your browser. Nothing is uploaded, logged, or stored.

Can I lower a prompt's complexity?

Often yes — split a multi-step task into separate calls, remove conditional branches, fix ambiguous wording, and simplify the required output format. Re-grade after editing to see the effect.

What is the Prompt Complexity Grader?

Scores your prompt on a 1-10 cognitive load scale based on required reasoning steps, world knowledge, instruction ambiguity, and output complexity, so you can match it to the right model and prompting strategy. It runs free in your browser on Gera Tools, with nothing uploaded.

Prompt Complexity Grader

Name: Prompt Complexity Grader
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Prompt complexity grader

Not every prompt needs your most expensive model. Some tasks are one-step lookups; others demand multi-hop reasoning, broad world knowledge, and a strict structured output all at once. The prompt complexity grader estimates how much cognitive load your prompt places on the model and returns a 1-10 score with a per-dimension breakdown — entirely in your browser, no API key, nothing uploaded.

How it works

The grader scans your prompt for measurable signals and combines them into four sub-scores:

Reasoning steps — looks for chains, multi-part questions, “then”, “after”, numbered steps, and conditional logic (“if… else”) that imply sequential thinking.
World knowledge — flags references to specialized domains, named entities, and breadth of topics the model must already know.
Instruction ambiguity — penalizes vague qualifiers (“good”, “appropriate”, “etc.”) and rewards explicit, concrete constraints.
Output complexity — detects demands for structured formats (JSON, tables, schemas), length targets, and multiple required sections.

Each dimension is normalized and the four are blended into a single 1-10 score. The logic is deterministic: the same prompt always grades the same way.

Reading the score in context

The score by itself tells you little without knowing what you intend to do with it. Here is a practical interpretation guide:

Score range	What it signals	Suggested approach
1–3	Simple, narrow, low-ambiguity	Small or fast model with a plain instruction works fine
4–6	Moderate reasoning or some structured output	Mid-tier model; chain-of-thought optional but helpful
7–8	Multi-step reasoning, specialized knowledge, or complex output format	Frontier model; consider explicit step-by-step instructions
9–10	Multiple hard dimensions simultaneously	Frontier model with chain-of-thought; strongly consider splitting into a chain

Tips and examples

A prompt like “Summarize this paragraph in one sentence” grades low — one step, no special knowledge, simple output. A prompt like “Read these three reports, reconcile the conflicting figures, explain your reasoning step by step, then output a JSON object with a confidence score per claim” grades high across every dimension.

When a prompt grades 8+, consider a frontier model with explicit chain-of-thought, or split it into smaller calls. When it grades 1-3, a small, cheap model with a plain instruction will usually do. Re-grade after editing to confirm your changes actually reduced the load.

Using the grade to cut costs

One of the most practical uses for this tool is finding over-provisioned prompts — tasks you have been sending to a powerful, expensive model out of habit, when a lighter model would handle them just as well. Grade your existing prompts systematically and you may find that a significant share of your API calls score 1–4. Routing those to a smaller model can cut per-token costs substantially without any quality loss. Grade first, then test with the lighter model to confirm before switching.