Question 1

What is a context window in simple terms?

Accepted Answer

A context window is the maximum amount of text — measured in tokens — that a model can consider at once. It includes your prompt, any conversation history, retrieved documents, and the response being generated. It is the model's working memory for a single request; anything that does not fit is truncated or dropped.

Question 2

How big are context windows today?

Accepted Answer

They range widely by model — from a few thousand tokens in older models to hundreds of thousands or even over a million in the largest current ones. A token is roughly three-quarters of an English word, so a 128,000-token window holds roughly 90,000–100,000 words, about the length of a short book.

Question 3

Does a bigger context window always mean better answers?

Accepted Answer

No. A model's nominal context (what it accepts) is not the same as its effective context (what it reasons over well). Models often accept far more text than they reliably use, and information buried in the middle of a long prompt is frequently retrieved less reliably than text at the start or end.

Question 4

What happens when I exceed the context window?

Accepted Answer

If your input plus the expected output would exceed the limit, the system either rejects the request or silently truncates earlier text — often the start of a long conversation. This is why long chats can seem to forget earlier details: those tokens fell outside the window and are no longer visible to the model.

What Is a Context Window in an LLM?

Working memory, not long-term memory

Why context costs so much

Nominal vs effective context

Working within the limit