Question 1

What is a loss function?

Accepted Answer

A loss function is a scalar that measures how far a model's predictions are from the correct answers. Training minimises this number — lower loss means better fit. It turns the abstract goal of 'be accurate' into a concrete quantity gradient descent can optimise.

Question 2

What loss function do language models use?

Accepted Answer

Language models are trained with cross-entropy loss (also called log loss) on next-token prediction. It penalises the model in proportion to how low a probability it assigned to the token that actually appeared, so confident wrong predictions are punished heavily.

Question 3

What is the difference between loss and a metric like accuracy?

Accepted Answer

Loss is the differentiable quantity the optimiser actually minimises; it must be smooth so gradients exist. A metric like accuracy is what humans care about but is not differentiable, so it is reported for monitoring rather than used directly for training.

Question 4

Why is mean squared error used for regression?

Accepted Answer

Mean squared error (MSE) squares the difference between predicted and true values, so larger errors dominate and the gradient grows with the error size. It is smooth, convex for linear models, and corresponds to maximum-likelihood estimation under Gaussian noise.

Loss Function (AI Glossary)

Definition

Cross-entropy loss for language models

Mean squared error for regression

How the loss drives gradient descent

Choosing the right loss