Is a token the same as a word?

No. A token is a chunk of text that may be a whole word, part of a word, a single character, or punctuation. Common words are usually one token, while rare or long words split into several. As a rough rule, 100 English tokens are about 75 words.

What is byte-pair encoding?

Byte-pair encoding (BPE) is the algorithm most LLMs use to build their token vocabulary. It starts from individual characters and repeatedly merges the most frequent adjacent pairs into new tokens. This gives common words their own token while still being able to spell out any rare word from smaller pieces.

Why does token count matter for cost?

API providers charge per token, counting both your input (prompt) and the output the model generates. Fewer tokens means a cheaper request. Token count also determines whether your text fits inside the model's context window, which is itself measured in tokens.

Does the same text have the same token count in every model?

Not exactly. Different models use different tokenizers and vocabularies, so the same sentence can produce slightly different counts. The estimates here approximate GPT-style tokenizers; for exact billing always use the provider's own token counter.

What Are Tokens in AI? How LLMs Read and Write Text

Tokens, not words

When a language model reads or writes text, it does not see words or letters directly — it sees tokens. A token is a unit of text drawn from a fixed vocabulary the model learned during training. A token might be a whole common word like ” the,” a fragment like “ization,” a single character, or a piece of punctuation. Everything an LLM does is measured in tokens, so understanding them explains both cost and context limits. Paste text into the estimator below to see roughly how many tokens it becomes.

How tokenization works

Most modern models use byte-pair encoding (BPE) or a close relative. The algorithm builds its vocabulary by starting with individual characters and repeatedly merging the most frequent adjacent pairs. After many merges, frequent words like “the” and “and” each become a single token, while rare words such as “antidisestablishmentarianism” get broken into several recognisable sub-pieces. This balance lets a vocabulary of around 50,000–100,000 tokens represent any text, including words it has never seen, by spelling them out from smaller parts.

Why token count matters

Two practical things depend entirely on tokens. First, cost: API providers bill per token for both the prompt you send and the response you receive, so a more verbose prompt is literally more expensive. Second, the context window — the maximum amount of text a model can consider at once — is measured in tokens, not words or pages. A 128,000-token window holds roughly 96,000 English words. Estimating tokens up front lets you predict cost and check that a long document will fit.

The rough conversion

For ordinary English prose, a handy rule is that 100 tokens are about 75 words, or roughly 4 characters per token. The ratio shifts with content: code, numbers, non-English scripts, and unusual words all use more tokens per word because they fragment more. The estimator below uses this character-based heuristic, which is close enough for budgeting; for exact billing, run your text through the provider’s official tokenizer.

Practical takeaways

Keep prompts tight to save tokens and money, and remember that whitespace and formatting count too. If you work across languages, expect non-English text to cost more tokens for the same meaning. And when a request fails for being “too long,” the limit it hit was almost certainly a token limit — trim the input or split it into chunks that each fit the window.