Question 1

What is a context window in simple terms?

Accepted Answer

A context window is the maximum amount of text — measured in tokens — that a model can consider at one time. It includes everything: your system prompt, the conversation history, any documents you paste in, and the model's own response. Think of it as the model's short-term working memory. Anything that does not fit in the window is simply not available to the model when it generates an answer.

Question 2

How big are the context windows of major models?

Accepted Answer

They vary widely and keep growing. Early GPT-3.5 was around 4K tokens; modern GPT-4-class and Claude models commonly offer 128K to 200K tokens, and Google's Gemini has offered windows of 1 million tokens or more. As a rough guide, 100K tokens is roughly 75,000 words — a long book — so today's large windows can hold substantial documents at once.

Question 3

What happens when I exceed the context window?

Accepted Answer

The text no longer fits, so something must be dropped. In a chat, the oldest messages are usually truncated, which is why a model can seem to forget what you said earlier in a very long conversation. Via the API, sending too many tokens returns an error. Either way, content outside the window has no influence on the response — the model genuinely cannot see it.

Question 4

Does a bigger context window always mean better answers?

Accepted Answer

No. A large window lets you include more material, but models can still struggle to use information buried in the middle of a very long context — the so-called lost-in-the-middle effect. Stuffing the window with marginally relevant text can also raise cost and latency and dilute the model's focus. Curated, well-placed context usually beats a large but noisy one.

What Is a Context Window? Size, Limits, and Why It Matters

What a context window is

How large are context windows today

Why bigger is not automatically better

Working within the limits