What is type-token ratio?

Type-token ratio (TTR) is the number of unique words (types) divided by the total number of words (tokens). A higher ratio means more varied vocabulary. Because TTR falls as text gets longer, compare it only across passages of similar length.

What counts as lexical density?

Lexical density here is the share of content words (everything that is not a common stopword) out of all words. Higher density usually signals more information-rich prose, while very low density can indicate filler or padding.

Why filter stopwords?

Stopwords like "the", "of", and "and" dominate any raw frequency count and tell you little about the subject matter. Filtering them lets the genuinely meaningful, content-bearing words rise to the top of the table.

Does any text leave my browser?

No. All counting and metric calculation happens entirely client-side in JavaScript. Nothing is uploaded, stored, or logged, so you can safely paste private or unpublished content.

How do I spot repetitive LLM output?

Look for a low type-token ratio paired with a few words appearing far more often than the rest. Models that loop or pad tend to reuse the same connectors and phrases, which shows up clearly in the frequency table.

What is the Word Frequency & Diversity Analyzer?

Analyze LLM output for word frequency, unique word count, type-token ratio, and lexical density. Surfaces repetitive or low-diversity text so you can spot bland, padded, or template-locked model responses fast. It runs free in your browser on Gera Tools, with nothing uploaded.

Word Frequency & Diversity Analyzer

Name: Word Frequency & Diversity Analyzer
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Large language models sometimes produce text that reads fine but is quietly repetitive — the same connectors, the same hedges, the same favourite adjectives over and over. This analyzer gives you the numbers behind that gut feeling: word frequencies plus three diversity metrics computed entirely in your browser.

What this reveals about LLM output specifically

Word frequency analysis surfaces problems that are invisible to a quick read but obvious in the numbers. Common LLM patterns the metrics catch:

Template lock. A model producing multiple outputs from the same prompt often reuses a small set of structure words (“furthermore,” “notably,” “it is important to note”) across every response. A frequency table dominated by these phrases signals the model is drawing from a template rather than generating fresh prose.

Topic drift. If a content word you expected to dominate the frequency table is absent or low, the model may have drifted off the actual topic. For example, a brief on renewable energy that ranks “technology” and “future” highly but barely mentions “solar” or “wind” has drifted toward generalities.

Padding and hedging. A very low lexical density — most words are function words and hedges — is a diagnostic sign of verbose, content-light output. “It is worth noting that, in many cases, it may be possible to consider…” is high in function words and low in content words.

How it works

The tool tokenizes your text into words (lowercased, stripped of punctuation), then counts how often each one appears. With the stopword filter on, ultra-common function words are removed so the ranking reflects real subject matter. From the counts it derives three figures: unique word count, type-token ratio (unique words ÷ total words), and lexical density (content words ÷ total words). The top-N table shows the most frequent words with their counts and share of the text.

Reading the metrics

A high type-token ratio means varied vocabulary; a low one means the text leans on a small set of words.
Lexical density below roughly 40% often signals padding, filler, or boilerplate phrasing.
A frequency table where one or two words dwarf the rest is a classic sign of a model stuck in a loop or over-using a template phrase.

Tips

TTR shrinks naturally as text grows, so only compare passages of similar length.
Run two model outputs for the same prompt through the tool to see which one writes with more range.
If a content word you expected to dominate is missing from the top of the list, the model may have drifted off topic.
Toggle stopwords off to see exactly which function words are inflating the count — “actually” and “essentially” appearing repeatedly are common LLM filler tells.