Is this the exact same as OpenAI's tiktoken?

It implements the same regex pre-tokenization and byte-level pre-processing that cl100k_base and o200k_base use, which reproduces token boundaries and counts very closely. Because the full merge tables ship as megabytes of binary, the final count is a high-accuracy estimate rather than a byte-identical match to the server-side tokenizer.

What is the difference between cl100k_base and o200k_base?

cl100k_base is the encoding used by GPT-3.5-turbo and GPT-4, with a vocabulary of about 100k tokens. o200k_base is the larger ~200k-token vocabulary used by GPT-4o and the o-series, which packs more text per token — especially for code and non-English languages — so the same text usually costs slightly fewer tokens.

Why do whitespace and capitalization change the token count?

GPT tokenizers attach leading spaces to words, so " token" and "token" are different tokens. Capitalization and punctuation also split differently. This is why reformatting a prompt can quietly change its cost.

Are special tokens counted?

The tool flags special tokens like if it detects them in your text and notes that they are reserved control tokens. In real API calls the chat template adds its own role and delimiter tokens on top of your visible text.

Is my text sent anywhere?

No. All tokenization runs locally in your browser using JavaScript. Nothing you type is uploaded, logged, or stored.

What is the Tiktoken Browser Tool?

Tokenize any text in the browser using the regex pre-tokenization and byte-level rules behind OpenAI's cl100k_base and o200k_base encodings. Shows token count, per-token segments, byte-pair fragments, and special-token detection. Nothing is uploaded. It runs free in your browser on Gera Tools, with nothing uploaded.

Tiktoken Browser Tool

Name: Tiktoken Browser Tool
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Tiktoken in the browser

Token count is the unit you pay in. This tool tokenizes text the way GPT models do — applying the regex pre-tokenization and byte-level encoding that cl100k_base and o200k_base use — so you can see how a prompt splits and roughly how many tokens it will cost before you spend a single API credit.

How it works

OpenAI’s BPE tokenizers run in two stages. First a regex splits the text into candidate pieces — words, leading-space-plus-word, numbers in groups, punctuation runs, and whitespace. Each piece is then encoded to UTF-8 bytes and merged with byte-pair encoding against the vocabulary. This tool reproduces stage one exactly (the official cl100k and o200k split patterns) and applies a faithful byte-pair merge heuristic to stage two, giving you token boundaries, the byte fragments inside each token, and a count that tracks the real tokenizer closely.

The two encodings differ mainly in vocabulary size. o200k_base has roughly twice the vocabulary, which lets it represent common multi-character sequences (and a lot of non-English text and code) as single tokens, so the same input typically yields fewer tokens than under cl100k_base.

Which encoding to choose

Encoding	Models it applies to
cl100k_base	GPT-3.5-turbo, GPT-4, GPT-4-turbo, text-embedding-ada-002
o200k_base	GPT-4o, GPT-4o-mini, o1, o3, o-series models

If you are unsure which model your application calls, pick cl100k_base as the default — it covers the widest deployed base. For GPT-4o and newer models use o200k_base, which will give a slightly lower count for the same text.

What affects token count more than you might expect

Whitespace and line breaks are tokenised differently depending on context. A blank line between paragraphs typically costs 1–2 tokens. Long prompts with heavy Markdown formatting (headers, code fences, bullet lists) can add 10–15% overhead compared to the same content in plain prose.

Numbers are chunked in groups of up to three digits under most encodings, so “1000000” becomes multiple tokens, not one. Dates in ISO format (“2025-06-21”) split at the hyphens and digits separately.

Code is often tokenised more efficiently than English prose, especially with o200k_base, because common programming patterns are directly in the vocabulary. Python keywords like def, return, and import are often single tokens.

Non-English text may cost significantly more tokens per word than English, depending on the language and script. Languages using non-Latin scripts (Chinese, Arabic, Devanagari) can use 2–4 tokens per character in cl100k. The o200k vocabulary improved coverage, so the same text may cost fewer tokens under the newer encoding.

Practical prompt engineering uses

Cost estimation before a long batch run: Paste a representative prompt and multiply the token count by your per-token rate to estimate job cost.
Context window planning: A model with a 128k context window can hold roughly 128,000 tokens of combined input. Tokenize your system prompt, few-shot examples, and typical user message to see how much headroom remains for the response.
Trimming expensive prompts: Paste your prompt, see which sections consume the most tokens, then edit to reduce. Often removing redundant instructions or collapsing verbose examples saves 20–30% of tokens with no quality loss.
Debugging unexpected behaviour: Tokenize the input your code actually sends (not what you think it sends). Subtle string concatenation bugs — like a missing space between two joined strings — can silently merge tokens in unexpected ways.

For exact billing always trust the usage field returned by the API; use this tool for fast local estimates while you iterate on a prompt.