AI Tokens ELI5: What the Model Actually Reads

Not words, not letters — tokens are the weird chunks AI actually processes

Ad placeholder (leaderboard)

What a token actually is

When you send text to an AI model, it never sees your words or letters directly. It sees tokens — small chunks of text drawn from a fixed vocabulary. A token is often a whole short word like “the”, sometimes a word-piece like “ing” or “tion”, and sometimes a single character or punctuation mark. The model’s entire view of language is this stream of numbered chunks, which is why “tokenisation” is the very first thing that happens to anything you type.

How it works

Use the box below to type a sentence and watch it break into coloured token chunks in real time. You will notice a pattern: frequent everyday words usually become a single token, while long, rare, or made-up words get sliced into several pieces. Spaces and punctuation get counted too — the leading space before a word is normally bundled into that word’s token. The tool also shows you the word count and character count alongside the token count, so you can feel the gap between how you read text and how the model reads it.

Why this matters

Tokens are not an academic detail — they are the unit that runs the economics of AI. Providers price their APIs per token, and every model’s context window (how much it can read at once) is measured in tokens, not words. A useful rule of thumb is that one English word is about 1.3 tokens on average, so a 1,000-word document is roughly 1,300 tokens. Code, unusual names, and other languages tokenise less efficiently and cost more per word. Understanding this is the difference between guessing at your AI bill and estimating it, and between mysteriously hitting a context limit and knowing exactly how much room you have left.

Ad placeholder (rectangle)