How accurate is this estimate?

It uses per-family characters-per-token ratios calibrated on benchmark English text, typically landing within 5-10% of the real SentencePiece/BPE count. For exact counts, run the official tokenizer locally.

Why do Llama 3 and Llama 2 differ?

Llama 3 uses a much larger 128K-token vocabulary versus Llama 2's 32K, so it packs more characters per token and produces lower counts for the same text. Mistral sits closer to Llama 2's 32K vocabulary.

Does this send my text anywhere?

No. All counting runs entirely in your browser with JavaScript. Nothing you paste is uploaded, stored, or logged.

Can I use this for code?

Yes, but code tokenizes denser than prose. Symbols, indentation, and identifiers produce more tokens per character, so treat the estimate as a lower bound for source files.

What is the Llama Token Counter?

Estimate token counts for Meta's Llama 2, Llama 3, and Mistral models client-side using calibrated BPE ratios. Plan prompts before self-hosting or calling Groq, Together, or Fireworks. It runs free in your browser on Gera Tools, with nothing uploaded.

Llama Token Counter — Gera Tools

Name: Llama Token Counter
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Llama and Mistral token counter

Self-hosting an open model or calling it through Groq, Together, or Fireworks means you pay and plan around tokens, not words. This tool estimates how many tokens a piece of text resolves to for Llama 2, Llama 3, and Mistral, so you can size prompts against a context window or project inference cost before you ship.

How it works

Llama and Mistral use SentencePiece/BPE tokenizers with different vocabulary sizes. Llama 2 and Mistral share a roughly 32,000-token vocabulary, while Llama 3 jumped to a 128,000-token vocabulary that packs more text into each token. Because a faithful in-browser tokenizer would require shipping the full merge tables, this tool uses calibrated characters-per-token ratios per family — about 3.6 chars/token for Llama 2 and Mistral and about 4.0 for Llama 3 — blended with a word-based estimate. That blend tracks the real tokenizer within roughly 5-10% on ordinary English.

Why vocabulary size matters for token counts

The vocabulary is the set of text chunks (tokens) the model learned to recognise as atomic units. A larger vocabulary means common multi-character sequences — like suffixes, common words, or frequent technical terms — are stored as single tokens rather than split into smaller pieces. The practical effect: the same sentence tokenizes to fewer tokens on Llama 3 than on Llama 2, even though both are reading identical text.

For example, a 1,000-word document of typical English prose might tokenize to roughly:

Llama 2 / Mistral (32K vocab): approximately 1,300–1,400 tokens
Llama 3 (128K vocab): approximately 1,150–1,250 tokens

The difference compounds at scale — a 4,000-token context window in Llama 2 holds somewhat less text than the same window in Llama 3.

Context windows for common Llama and Mistral models

Model	Context window
Llama 2 7B / 13B / 70B	4,096 tokens
Llama 3 8B / 70B	8,192 tokens (extended to 128K via RoPE scaling in some variants)
Llama 3.1 8B / 70B / 405B	128,000 tokens
Mistral 7B v0.1	8,192 tokens
Mixtral 8x7B	32,768 tokens

When planning prompts, subtract the expected output length from the context window to find how much input space remains. Leave headroom for special tokens — BOS (beginning of sequence), EOS, and instruction-tuning wrapper tokens add a small overhead that is invisible in the raw text count.

Token cost variation by content type

Tokens per character are not constant. The estimate here is calibrated for typical English prose. Be aware of these differences:

Code: Programming languages use tokens efficiently for reserved words (def, class, return) but inefficiently for identifiers and strings. Indentation whitespace in Python costs tokens. Code tends to tokenize closer to English prose per character but with more variance.

Non-Latin scripts: Arabic, Chinese, Korean, Japanese, and Thai scripts cost significantly more tokens per character because the vocabulary has fewer native entries for these languages. A sentence that takes 20 tokens in English might take 40–80 tokens in a non-Latin script on Llama 2.

Emoji and special characters: Each emoji typically occupies one token on models trained with broad Unicode support, but older models may split them into multiple byte tokens.

Repetitive text: Boilerplate, repeated headers, and very common phrases tokenize more efficiently because they match vocabulary entries exactly.

Tips and notes

Token counts vary by language and content type: non-Latin scripts and emoji cost far more tokens per character, while repetitive or boilerplate text costs fewer. Reserve headroom in your context budget — a “4K” model can rarely fit a literal 4,096-token prompt once the response and special tokens are accounted for. For billing-critical decisions, confirm with the model’s own tokenizer. On Groq and Together AI, the provider shows actual token counts in the API response — use those for billing reconciliation rather than estimates.