Question 1

What is the learning rate?

Accepted Answer

The learning rate is the hyperparameter that controls how big a step the optimiser takes when updating model weights in the direction the gradient points. It scales each update: a larger learning rate moves the weights further per step, a smaller one moves them less.

Question 2

What happens if the learning rate is too high?

Accepted Answer

Too high a learning rate makes updates overshoot the minimum of the loss. Training can oscillate, the loss can spike or diverge to infinity (often showing as NaN), and the model may never settle. It is one of the most common causes of failed training runs.

Question 3

What happens if the learning rate is too low?

Accepted Answer

Too low a learning rate makes training extremely slow — many more steps are needed to make progress — and the model can get stuck in shallow, poor minima or plateau before reaching good performance. It wastes compute and time.

Question 4

What is learning rate warmup and decay?

Accepted Answer

Warmup ramps the learning rate up from near zero over the first steps so early, unreliable gradients do not destabilise the model. Decay then gradually reduces it — often via cosine decay — so the model takes precise small steps as it converges. The two are usually combined.

Learning Rate (AI Glossary)

What is the learning rate?

When the learning rate is too high

When the learning rate is too low

Warmup and decay schedules

Learning rate and adaptive optimisers