Question 1

What is beam search in text generation?

Accepted Answer

Beam search is a decoding algorithm that keeps the K most probable partial sequences (beams) at every generation step instead of committing to a single best token. By exploring several candidates in parallel, it can recover from a locally good but globally poor first choice that greedy decoding would lock in.

Question 2

What does beam width control?

Accepted Answer

Beam width (K) is the number of candidate sequences kept alive at each step. A larger beam explores more of the search space and usually finds higher-probability sequences, but it costs more compute and memory and can produce bland, repetitive text. Typical values for translation are 4 to 10.

Question 3

Why does beam search need a length penalty?

Accepted Answer

Because each extra token multiplies in another probability below 1.0, longer sequences naturally score lower, biasing plain beam search toward short outputs. A length penalty (or length normalisation) divides the score by sequence length raised to a power, rebalancing the comparison so longer, complete answers are not unfairly penalised.

Question 4

When should I use beam search instead of sampling?

Accepted Answer

Use beam search for tasks with a single correct-ish answer where fidelity matters — machine translation, summarisation, and grammatical correction. Use sampling methods like top-p or temperature for open-ended creative writing, chat, and brainstorming, where diversity and surprise are desirable.

What Is Beam Search? Better Text Generation Through Exploration

Definition

Why greedy decoding falls short

How beam search works

The length-penalty problem

When beam search wins — and when it loses