Question 1

What is batch size in machine learning?

Accepted Answer

Batch size is the number of training examples a model processes before it updates its weights once. With mini-batch training, the dataset is split into batches, and the model computes one averaged gradient and one weight update per batch.

Question 2

What is the trade-off between large and small batches?

Accepted Answer

Large batches give smoother, more stable gradient estimates and use hardware efficiently but need more memory and can generalise slightly worse. Small batches add useful noise that can improve generalisation and need less memory, but train less stably and slower per epoch.

Question 3

What is gradient accumulation?

Accepted Answer

Gradient accumulation lets you simulate a large batch on limited memory by computing gradients over several small batches and summing them before performing one weight update. It is widely used to train large models on GPUs that cannot hold a big batch at once.

Question 4

How does batch size relate to learning rate?

Accepted Answer

Batch size and learning rate are linked: when you increase the batch size, you typically scale the learning rate up too, because larger batches give less noisy gradients. A common heuristic is to scale the learning rate roughly in proportion to the batch size.

Batch Size (AI Glossary)

Definition

The three regimes

The large-vs-small trade-off

Gradient accumulation

Relationship with learning rate

Why it matters