How are tokens estimated without a tokenizer library?

The counter uses well-established character-and-word heuristics calibrated per model family — roughly four characters per token for English, adjusted for punctuation and whitespace. It is an estimate, not the exact byte-pair count.

For ordinary English prose it is typically within about 10 to 15 percent of the real count. Code, non-Latin scripts and heavy punctuation tokenize less predictably, so treat those as rougher. Confirm with the provider's tokenizer for billing-critical work.

Why do the models show different counts?

Each provider uses a different tokenizer, so the same text splits into a slightly different number of tokens. The per-model figures reflect those known differences in average token density.

Is my prompt sent anywhere?

No. The estimate is computed entirely in your browser. Your prompt is never uploaded, logged or stored.

What is the Prompt Token Counter?

Paste any prompt and get an estimated token count per model family — GPT-4o, Claude and Gemini — plus a cost estimate at typical input pricing. A fast browser-side token estimator for budgeting prompts and staying inside context limits. It runs free in your browser on Gera Tools, with nothing uploaded.

Prompt Token Counter

Name: Prompt Token Counter
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Estimate prompt tokens before you spend a call

Tokens drive both your cost and whether a prompt fits the model’s context window. This counter gives you a fast, per-model token estimate for GPT-4o, Claude and Gemini as you type, plus a rough input-cost figure and how much of a typical context window the prompt uses — so you can trim before you send rather than after a 400 or a surprise bill.

How it works

Counting exact tokens requires each provider’s byte-pair tokenizer, which is too heavy to ship into a browser page. Instead this tool uses a calibrated heuristic: it measures character count, word count and whitespace, then applies a per-model density factor reflecting how each tokenizer tends to split English text (around four characters per token, with adjustments). The result is a close estimate for ordinary prose. It then multiplies by published input prices to show an approximate cost and compares against a representative context-window size for each family.

How tokenisation works across models

Understanding why the three models show different token counts for the same text requires a little background on how tokenisation works.

All three major providers use a form of byte-pair encoding (BPE), which merges frequent character sequences into single tokens. Common words in English — “the,” “and,” “is” — are typically single tokens. Less common words split into multiple tokens. The specific vocabulary (which sequences get merged into single tokens) varies by provider because each trained their own tokeniser on different data.

The result is that the same word can be:

One token in GPT-4o’s tokeniser
Two tokens in Claude’s tokeniser
One token in Gemini’s tokeniser

For typical English prose, the difference between models is small — usually within 10–15% of each other. For code, non-Latin scripts, or text heavy with punctuation and symbols, the divergence can be larger because the tokenisers handle low-frequency characters very differently.

Text types and token density

The heuristic performs differently depending on what you paste in. These are the most important patterns to know:

Ordinary English prose — the heuristic is well-calibrated here, typically within 10% of the real count. Use the estimate confidently for prompt budgeting.

Source code — code tokenises more token-densely than prose because of brackets, operators, indentation, and variable names. Estimates may undercount real tokens by 15–25%.

JSON and structured data — similar to code. Heavily punctuated structured content has more tokens per character than prose.

Non-Latin scripts (Arabic, Chinese, Japanese, Korean, etc.) — these scripts tokenise very differently across providers. Chinese characters, for example, are often a single token each in some vocabularies but split in others. Do not rely on this estimate for non-Latin text; use the provider’s native token counter.

Mixed technical text — a prompt that combines natural language instructions with embedded code blocks or JSON examples will be somewhere between the prose and code estimates.

Why context-window utilisation matters

Beyond cost, context-window fit is a hard constraint. A prompt that exceeds the model’s context window fails with an error, but a prompt that is too close to the limit also creates problems:

The model has less room for its own response (completion tokens come from the same budget)
Very long contexts can cause the model to lose track of instructions or context given early in the prompt
Caching benefits (where available) work better when the cached portion is stable — extremely long dynamic prompts are harder to cache effectively

The context-window percentage shown by the counter helps you assess whether you have adequate headroom, not just whether you technically fit.

Tips and notes

Use this for budgeting and context-fit checks, not for exact billing — the real number comes from the provider’s own tokeniser (tiktoken for OpenAI, the Anthropic token-count endpoint, Gemini’s count-tokens API). Estimates are least accurate on source code, JSON, non-English scripts and text with unusual punctuation, where token density diverges from prose. If a prompt sits right at a context limit, leave headroom for the model’s response, which also consumes tokens from the same budget. Everything runs locally, so pasting sensitive prompt text is safe.