What is an encoder-decoder architecture?
The encoder-decoder architecture is a neural network design with two cooperating halves. The encoder reads an input sequence and compresses it into a dense internal representation that captures its meaning. The decoder then expands that representation into an output sequence, one token at a time. This was the original transformer design introduced in the 2017 paper “Attention Is All You Need”, built for machine translation where the input (one language) and output (another language) are different sequences.
The key innovation that links the two halves is cross-attention: at each step the decoder attends back over the full encoded input, keeping its output grounded in the source.
Three transformer variants
Modern transformers come in three flavours, depending on which halves they use:
- Encoder-only (e.g. BERT) — keeps just the encoder. It reads the entire input bidirectionally, attending to tokens on both sides at once. This makes it excellent at understanding tasks: classification, named-entity recognition, sentence similarity, and retrieval embeddings. It does not generate free-form text.
- Decoder-only (e.g. GPT, Llama) — keeps just the decoder with causal (left-to-right) attention, so each token can only see earlier tokens. This makes it ideal for generation: chat, completion, and reasoning. Most large language models today are decoder-only.
- Encoder-decoder (e.g. T5, BART) — keeps both halves. The encoder builds a bidirectional representation of the input and the decoder generates output while cross-attending to it. This suits sequence-to-sequence tasks where the output is a transformed version of a specific input.
Which design fits which task?
The right architecture follows the shape of the problem:
- Use encoder-only when you need a rich understanding of a fixed input — sentiment analysis, classification, or producing embeddings for search.
- Use decoder-only when you need open-ended generation that continues from a prompt — conversation, code completion, brainstorming.
- Use encoder-decoder when you map one full sequence to another — translation, summarisation, grammar correction, or structured data-to-text.
Why decoder-only models came to dominate
Although the original transformer was encoder-decoder, most leading chat models are decoder-only. The reason is generality: a decoder-only model can treat almost any task as text continuation by placing the “input” in the prompt and generating the “output” as the continuation. This unified framing — combined with straightforward, highly scalable pretraining on next-token prediction — made decoder-only models the workhorse of modern generative AI, while encoder-only and encoder-decoder designs remain strong for understanding and explicit sequence-to-sequence work.