Question 1

What is a context window?

Accepted Answer

A context window is the maximum number of tokens a language model can read and reason over in a single request. Your prompt, any retrieved documents, and the model's reply all have to fit inside this limit together.

Question 2

How big are current context windows?

Accepted Answer

They range widely: older models had 4K–8K tokens, while modern models offer 128K, 200K, or even 1M+ tokens. Roughly, 1,000 tokens is about 750 English words, so a 128K window holds around a 250-page book.

Question 3

Why can't context windows be infinite?

Accepted Answer

Standard attention cost grows quadratically with sequence length, so doubling the context roughly quadruples compute and memory. Very long windows also dilute the model's focus, sometimes causing it to miss information in the middle.

Question 4

What happens if I exceed the context window?

Accepted Answer

The request is rejected or older tokens are truncated, so the model silently loses earlier content. The usual fixes are summarising history, chunking documents, or using retrieval to pull in only the relevant pieces.

Context Window (AI Glossary)

Definition

Measured in tokens, not words

Why bigger isn’t free

The “lost in the middle” problem

Working within the limit