Is this exact tiktoken tokenization?

No. True BPE tokenization (tiktoken) needs the model's merge tables, which are large and model-specific. This tool uses a fast word-level approximation that is excellent for frequency and diversity analysis, where word boundaries matter more than subword splits.

What is the type-to-token ratio?

It is the number of unique tokens (types) divided by the total number of tokens. A ratio near 1 means highly varied vocabulary; a low ratio means the text repeats the same words a lot, which often signals padded, looping, or low-quality output.

How are repeated phrases detected?

The tool counts every 2-word and 3-word sequence (bigrams and trigrams) and surfaces those that occur more than once. Repeated multi-word phrases are a strong signal of templated or degenerate generation.

Does it upload my text?

No. All tokenization and counting run locally in your browser. Your text never leaves the page, so it is safe for confidential content.

What is the Token Frequency Analyzer?

Tokenizes text with a word-level approximation and shows the token frequency distribution, type-to-token ratio for vocabulary diversity, and flags high-frequency stopwords and repeated phrases — a quick way to spot repetitive or low-variety LLM output. It runs free in your browser on Gera Tools, with nothing uploaded.

Token Frequency Analyzer

Name: Token Frequency Analyzer
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Token frequency analyzer

Repetitive, low-variety output is one of the clearest symptoms of a struggling prompt or a model stuck in a loop. This tool profiles any text by tokenizing it, counting how often each token appears, and computing a vocabulary-diversity score — then flags the repeated phrases and dominant stopwords that make output feel padded or robotic.

How it works

The text is lowercased and split on word boundaries into tokens (a fast, model-agnostic approximation of subword tokenization that is well-suited to frequency work). The tool tallies total tokens and unique tokens, then computes the type-to-token ratio — unique divided by total — as a diversity score from 0 to 1. It builds a ranked frequency table, separately highlighting common stopwords that dominate the count. Finally it counts every bigram and trigram and surfaces the multi-word phrases that repeat, since repeated phrases are the clearest fingerprint of templated or degenerate generation.

What the metrics mean

Type-to-token ratio (TTR): A TTR of 1.0 means every word is unique — typical of a word list. Natural flowing prose usually sits between 0.4 and 0.7. Anything below 0.3 in a response of moderate length is a warning sign: the model is re-using the same words at an unusual rate, whether from a looping failure or a poorly constrained system prompt. Short snippets naturally have higher TTRs, so compare within the same length range for meaningful results.

Repeated trigrams: A three-word phrase that appears more than twice in a few-hundred-word response almost always indicates degenerate generation — the model completing the same sentence skeleton each time it addresses a sub-point. For example, “it is important” appearing seven times in a 400-word answer is a sign to either rewrite the prompt or increase temperature/penalty settings.

Non-stopword frequency: Once stopwords are filtered, the top non-stopword tokens reveal what the model actually focused on. In a summarization task, those tokens should match the key concepts of the source document. If unrelated words dominate, the model has drifted.

Practical use cases

Prompt debugging. Paste several responses to the same prompt and compare TTRs. A consistent low ratio means the prompt constrains variety — often caused by overly specific format instructions or a very low temperature setting.
Quality screening at scale. Batch-check LLM outputs before delivering them to users. A TTR below your threshold is a cheap heuristic to flag for human review without reading every response.
A/B testing prompts. Compare the frequency distributions of two prompt variants on the same input. Higher TTR plus fewer repeated trigrams generally indicates a more capable or better-calibrated prompt.
Detecting padding in long-form content. Blog posts, technical articles, and marketing copy generated by AI can look long while repeating the same points. High-frequency non-stopwords that cluster in small semantic groups confirm this.

Tips and notes

Watch the ratio. Healthy prose usually lands above ~0.4 type-to-token; a much lower number on long text suggests heavy repetition.
Repeated trigrams are a red flag. A phrase appearing three or more times in a short answer almost always means the model looped or padded.
Stopwords are expected to top the list. Focus on the highest non-stopword tokens to understand what the text is actually about.
TTR degrades with length. Very long documents will always have a lower TTR because words inevitably recur; use it comparatively across texts of similar length, not as an absolute benchmark across different document sizes.
Everything is local. No network calls — confidential text stays on your machine.