Which tokenizer does GPT-4o use?

GPT-4o uses the o200k_base byte-pair encoding (BPE), a successor to the cl100k_base vocabulary used by GPT-4 and GPT-3.5-turbo. o200k_base has a larger vocabulary and is more efficient on code and non-English text.

How accurate is this estimate?

This is a heuristic based on the measured average characters-per-token ratio for the o200k/cl100k families. For English prose it is typically within 5-10% of the exact tiktoken count; for code or non-Latin scripts the gap can be larger.

Is my text sent to OpenAI or anywhere else?

No. The estimate runs entirely in your browser. Nothing you paste is uploaded, stored, or logged.

Do both the prompt and the reply count toward the limit?

Yes. Your input tokens plus the model's generated output tokens both consume the 128K context window, and both are billed. Leave headroom for the response.

What is the GPT-4o Token Counter?

Paste any text and instantly estimate the token count using the o200k_base / cl100k_base byte-pair tokenizer that GPT-4o and GPT-4 use. See chars, words, and per-1M cost projections, fully client-side. It runs free in your browser on Gera Tools, with nothing uploaded.

GPT-4o Token Counter

Name: GPT-4o Token Counter
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

GPT-4o token counter

Paste any text and instantly estimate how many tokens it uses under GPT-4o’s tokenizer. Tokens — not words or characters — are the unit OpenAI bills and limits on, so knowing the count helps you stay inside the 128K context window and predict cost before you send a request.

How the estimate works

GPT-4o tokenizes with o200k_base, a byte-pair encoding (BPE) vocabulary that splits text into roughly 4 characters per token for typical English. This tool applies that measured ratio, blended with a word-boundary heuristic, so the result tracks the real tiktoken count closely without bundling the full vocabulary in the browser. A system prompt, if you add one, is counted separately and summed into the total request size.

How byte-pair encoding tokenization works

Byte-pair encoding (BPE) is a compression-derived algorithm that starts with individual characters as the vocabulary and iteratively merges the most frequent adjacent pairs. Over millions of iterations on a large text corpus, common English words become single tokens (“the”, “and”, “is”), common word fragments become tokens (“ing”, “tion”, “pre”), and rare or unusual strings fall back to short character sequences.

The result is a vocabulary where common English words are 1 token, inflected forms are often 1–2 tokens, and completely novel strings are 3–6+ tokens. This matters practically:

A 750-word English essay is roughly 1,000 tokens (the 4 characters/token ratio works well here).
The same length in German or French runs longer because compound words and inflected endings are tokenized less compactly.
Python or JavaScript code is typically more token-efficient than the character count suggests, because common identifiers and keywords are frequent BPE pairs.
JSON with many short keys and values can be surprisingly token-expensive because punctuation characters often tokenize separately.

What o200k_base changed from cl100k_base

The o200k_base vocabulary used by GPT-4o is a successor to the cl100k_base vocabulary used by GPT-4 and GPT-3.5-turbo. The key improvements:

Larger vocabulary (200K vs ~100K tokens): more distinct tokens means common sequences are covered as single tokens rather than two, improving efficiency.
Better multilingual coverage: non-English text (especially East Asian scripts, Arabic, and emoji sequences) tokenizes more efficiently on o200k_base, meaning the same text costs fewer tokens on GPT-4o than it did on GPT-4.
Better code coverage: code and markup patterns common in developer use cases have more efficient representations.

For most English text, the per-token price of GPT-4o largely determines cost rather than the vocabulary efficiency difference. For multilingual or code-heavy workloads, the tokenizer improvement is a meaningful reduction in token count and therefore cost.

Practical use cases for this tool

Prompt budgeting before an API call: if you are building a chain that prepends a long system prompt and several examples before each user message, check the total token count here before deploying to avoid hitting context limits or unexpected billing.
Comparing prompt variants: paste two versions of a prompt to see which one is more token-efficient before running it at scale.
Estimating monthly costs: multiply the per-request token count by your expected request volume and the per-1M token price to project your monthly spend.
Checking whether a document fits a context window: paste a document to confirm it fits inside the 128K token limit, leaving headroom for the response.

Tips and notes

Code, JSON, and non-Latin scripts pack fewer characters per token — expect a higher count than the English-tuned estimate.
o200k_base is more token-efficient than the older cl100k_base, so the same text often costs slightly fewer tokens on GPT-4o than on GPT-4.
For an exact count before a high-volume production call, run OpenAI’s own tiktoken library with the o200k_base encoding.
Both input tokens and output tokens are billed separately. Leave room in your context budget for the response — a 128K-token input leaves no room for any output.