Tokens, not words
When a language model reads or writes text, it does not see words or letters directly — it sees tokens. A token is a unit of text drawn from a fixed vocabulary the model learned during training. A token might be a whole common word like ” the,” a fragment like “ization,” a single character, or a piece of punctuation. Everything an LLM does is measured in tokens, so understanding them explains both cost and context limits. Paste text into the estimator below to see roughly how many tokens it becomes.
How tokenization works
Most modern models use byte-pair encoding (BPE) or a close relative. The algorithm builds its vocabulary by starting with individual characters and repeatedly merging the most frequent adjacent pairs. After many merges, frequent words like “the” and “and” each become a single token, while rare words such as “antidisestablishmentarianism” get broken into several recognisable sub-pieces. This balance lets a vocabulary of around 50,000–100,000 tokens represent any text, including words it has never seen, by spelling them out from smaller parts.
Why token count matters
Two practical things depend entirely on tokens. First, cost: API providers bill per token for both the prompt you send and the response you receive, so a more verbose prompt is literally more expensive. Second, the context window — the maximum amount of text a model can consider at once — is measured in tokens, not words or pages. A 128,000-token window holds roughly 96,000 English words. Estimating tokens up front lets you predict cost and check that a long document will fit.
The rough conversion
For ordinary English prose, a handy rule is that 100 tokens are about 75 words, or roughly 4 characters per token. The ratio shifts with content: code, numbers, non-English scripts, and unusual words all use more tokens per word because they fragment more. The estimator below uses this character-based heuristic, which is close enough for budgeting; for exact billing, run your text through the provider’s official tokenizer.
Practical takeaways
Keep prompts tight to save tokens and money, and remember that whitespace and formatting count too. If you work across languages, expect non-English text to cost more tokens for the same meaning. And when a request fails for being “too long,” the limit it hit was almost certainly a token limit — trim the input or split it into chunks that each fit the window.