Question 1

What is perplexity in language models?

Accepted Answer

Perplexity measures how well a language model predicts a sample of text. It is the exponential of the model's average cross-entropy loss per token on held-out data. Intuitively, it is the average number of equally likely choices the model felt it had at each token, so lower perplexity means the model is more confident and more accurate in its predictions.

Question 2

Is a lower or higher perplexity better?

Accepted Answer

Lower is better. A lower perplexity means the model assigned higher probability to the text that actually occurred, so it is predicting the language more accurately. A perplexity of 1 would mean perfect prediction; real models on real text sit well above that, and you compare models on the same test set.

Question 3

How is perplexity calculated?

Accepted Answer

You compute the model's cross-entropy loss — the average negative log-probability it assigns to the true next token — across a held-out test set, then exponentiate that value. Because it depends on the tokeniser and the specific test data, perplexity numbers are only comparable when those are held constant.

Question 4

Does low perplexity mean a model is good?

Accepted Answer

Not entirely. Perplexity measures raw predictive fit to text, not helpfulness, factual accuracy, safety, or reasoning. A model can have low perplexity yet still hallucinate or be unhelpful, which is why perplexity is used alongside task benchmarks and human evaluation rather than on its own.

Perplexity (AI Glossary)

What perplexity is

How it is computed

Why it is useful

Why low perplexity is not the whole story

Where it sits among metrics