Question 1

What is cross-entropy loss?

Accepted Answer

Cross-entropy loss measures how far a model's predicted probability distribution is from the true distribution. It is small when the model assigns high probability to the correct answer and large when it confidently predicts the wrong one, making it a natural training objective for classification.

Question 2

Why is cross-entropy used to train language models?

Accepted Answer

Language modelling is framed as predicting the next token from a vocabulary, which is a classification problem over thousands of classes. Cross-entropy compares the model's predicted probabilities to the actual next token, giving a gradient that nudges the model toward the right prediction.

Question 3

How does cross-entropy relate to softmax?

Accepted Answer

Softmax converts a model's raw output scores into a probability distribution, and cross-entropy then measures how good that distribution is against the true label. The two are almost always paired, and their combined gradient is simple and stable to compute.

Question 4

What does a low cross-entropy value mean?

Accepted Answer

A low cross-entropy means the model assigned high probability to the correct outcomes, so its predictions closely match reality. During training the goal is to minimise this value; a related metric, perplexity, is simply the exponential of the cross-entropy.

Cross-Entropy Loss (AI Glossary)

Definition

The intuition

Pairing with softmax

Cross-entropy in language models

Why it matters