Context window size history timeline
In just a few years, the amount of text you can feed a language model jumped from a couple of paragraphs to entire books. This interactive timeline plots the maximum context window of major models from 2020 onward, with the input price per million tokens overlaid so you can see that longer context arrived and got cheaper at the same time.
How it works
Each model is positioned by release date and drawn with a bar scaled to its maximum context window in tokens. Because the range spans from ~2,000 tokens (GPT-3) to over 1,000,000 (Gemini 1.5), the bars use a logarithmic scale so the early models remain visible next to the giants. The launch-era input price sits beside each entry to show the cost trend. You can add your own entry to drop a new release or an internal model onto the same curve.
What the trend shows
- Roughly 10× every couple of years. From 2k → 8k → 32k → 128k → 1M, the ceiling has climbed in big steps.
- Cheaper per token, not just bigger. Input prices fell even as windows grew, which is what made long-context use cases practical.
- Diminishing real-world gains. A bigger window helps only if the model uses the middle of it well — retrieval and good chunking still matter.
- Context is now rarely the bottleneck. For most tasks the limit is reasoning quality and cost, not how much you can paste.