Token (AI Glossary)

The fundamental unit of text that LLMs process: not a word, not a character

Ad placeholder (leaderboard)

Definition

A token is the basic unit of text that a large language model reads and writes. It is not a word and not a single character — it is usually a subword piece: a short common word, a fragment of a longer word, a space, or a punctuation mark. When you send a prompt, the model first splits it into tokens, processes that sequence, and then generates its reply one token at a time.

Tokens, words, and characters

As a working rule for English, 1 token ≈ ¾ of a word, so 1,000 tokens is about 750 words. Short, common words (“the”, “and”, “cat”) are typically a single token, while rarer or longer words (“tokenization”, “antidisestablishment”) break into several. Spaces and punctuation count too, and non-English scripts or code often use more tokens per character because they appear less frequently in the tokenizer’s training data.

How tokens are made

Most models use a learned tokenizer such as Byte Pair Encoding (BPE) or a variant. The tokenizer is trained on a large corpus to find the most frequent character sequences and assign each its own ID. Frequent patterns become single tokens; rare patterns are assembled from smaller pieces. Because each model family ships its own tokenizer, the same sentence can have different token counts in GPT, Claude, or an open-source model.

Why token count matters

Tokens are the unit of both capability and cost:

  • Context windows are measured in tokens. A 128K-token window is a budget that your prompt and the model’s response must share.
  • API pricing is per token, usually with separate rates for input and output. A more verbose prompt or format literally costs more.
  • Latency scales with tokens generated, since the model emits them one by one.

This is why prompt engineers care about concise wording, compact formats, and trimming unnecessary context — every saved token reduces cost and frees space.

Practical implications

Understanding tokens explains several common surprises. A long pasted document may be rejected for exceeding the window even though it “looks” short in pages. A prompt that works in one model may cost more in another with a less efficient tokenizer. And character-level tasks — counting letters, reversing strings — are hard for LLMs precisely because they see tokens, not individual characters. When estimating cost or fit, always think in tokens rather than words.

Ad placeholder (rectangle)