Token Visualizer

See exactly how your text is split into tokens with color highlights

Ad placeholder (leaderboard)

Token visualizer

Models do not read characters or words — they read tokens. Seeing exactly where your text breaks into tokens explains why billing, context limits, and even model behaviour sometimes surprise you. This visualizer colors each token as its own chip so you can spot subword splits, whitespace handling, and the hidden cost of numbers and rare words.

How it works

The tool applies a byte-pair-style splitter modeled on cl100k_base, the tokenizer GPT-4o uses. It first segments on the same boundaries the real regex uses — contractions, runs of letters, runs of digits, punctuation, and leading spaces — then merges short common fragments the way BPE would. Crucially, a leading space stays attached to the word after it, which is why token counts as one token rather than two. Long digit runs and uncommon words break into multiple subword tokens, which the colored chips make immediately visible.

Tips and notes

  • If a chip splits a common word into pieces, that word is rare in the vocabulary — rephrasing with more common words can shave tokens off long prompts.
  • Numbers, IDs, and hashes are token-hungry; consider whether you really need them verbatim in a prompt you send thousands of times.
  • For an exact production count, call the provider’s tokenizer or count_tokens endpoint — use this visualizer for intuition and quick estimates.
Ad placeholder (rectangle)