Autoregressive — definition
Autoregressive describes a model that generates a sequence one element at a time, where each new element is predicted from all the elements produced so far. In AI language models, the “elements” are tokens: the model predicts the next token, appends it to the text, and repeats.
How it works
Given the text so far, an autoregressive LLM outputs a probability distribution over the next token, picks one (according to settings like temperature), and feeds the extended sequence back in to predict the following token. This loop — next-token prediction — is how GPT, Claude, Gemini and Llama write a sentence.
Why it matters
- Sequential by nature. Token N must exist before token N+1 can be predicted, so generation can’t be fully parallelised. This is a key reason long responses take time and why “streaming” shows text appearing token by token.
- Context-dependent. Because each token conditions on everything before it, the model’s earlier choices shape its later ones — which is why a small change early in a prompt can change the whole answer.
Contrast: non-autoregressive models
Not every generative model is autoregressive. Diffusion models (used for AI images, and increasingly explored for text) start from noise and refine the whole output in parallel over several steps, rather than left-to-right. Each approach has different trade-offs in speed, control and quality.