Word Frequency & Diversity Analyzer

Measure vocabulary diversity and word frequency in LLM-generated text.

Ad placeholder (leaderboard)

Large language models sometimes produce text that reads fine but is quietly repetitive — the same connectors, the same hedges, the same favourite adjectives over and over. This analyzer gives you the numbers behind that gut feeling: word frequencies plus three diversity metrics computed entirely in your browser.

How it works

The tool tokenizes your text into words (lowercased, stripped of punctuation), then counts how often each one appears. With the stopword filter on, ultra-common function words are removed so the ranking reflects real subject matter. From the counts it derives three figures: unique word count, type-token ratio (unique words ÷ total words), and lexical density (content words ÷ total words). The top-N table shows the most frequent words with their counts and share of the text.

Reading the metrics

  • A high type-token ratio means varied vocabulary; a low one means the text leans on a small set of words.
  • Lexical density below roughly 40% often signals padding, filler, or boilerplate phrasing.
  • A frequency table where one or two words dwarf the rest is a classic sign of a model stuck in a loop or over-using a template phrase.

Tips

  • TTR shrinks naturally as text grows, so only compare passages of similar length.
  • Run two model outputs for the same prompt through the tool to see which one writes with more range.
  • If a content word you expected to dominate is missing from the top of the list, the model may have drifted off topic.
Ad placeholder (rectangle)