Question 1

What is the difference between FP16 and BF16?

Accepted Answer

Both are 16-bit floating-point formats, but they split their bits differently. FP16 uses 5 exponent bits and 10 mantissa bits, giving more precision but a small numeric range. BF16 uses 8 exponent bits and 7 mantissa bits, matching FP32's range but with less precision — which makes it more robust against overflow during training.

Question 2

Why does BF16 work better for LLM training?

Accepted Answer

BF16 shares the same 8-bit exponent as FP32, so it covers the same wide range of magnitudes and rarely overflows or underflows. Large language models produce extreme gradient and activation values, so BF16's range matters more than FP16's extra precision, and it usually trains stably without loss scaling.

Question 3

What is mixed-precision training?

Accepted Answer

Mixed-precision training stores and computes most values in a 16-bit format (FP16 or BF16) while keeping a master copy of weights and certain reductions in FP32. This roughly halves memory use and speeds up matrix multiplication on tensor cores while preserving the numerical stability of full precision where it matters.

Question 4

Does using FP16 or BF16 hurt model accuracy?

Accepted Answer

When done correctly with mixed precision, final model accuracy is essentially unchanged versus FP32. FP16 needs loss scaling to avoid gradient underflow; BF16 usually needs none. The main benefits — less memory and faster training — come with negligible accuracy cost.

FP16 and BF16 (AI Glossary)

Definition

How floating-point formats split their bits

The precision-versus-range trade-off

Mixed-precision training

Why it matters