The core idea
T5, short for Text-to-Text Transfer Transformer, is a model Google introduced in 2019 with a deceptively simple unifying idea: treat every natural-language task as converting one piece of text into another. Whether the task is translation, summarisation, sentiment classification, or question answering, the input is text and the output is text. This framing means a single model, a single training objective, and a single decoding procedure can handle a huge range of problems — you just change the words you feed in and read out.
The text-to-text framing
In most NLP setups, different tasks need different output formats: a class label, a span, a number, a
sentence. T5 abolishes that variety. To classify a review’s sentiment, you prepend an instruction like
sst2 sentence: and the model outputs the literal word “positive” or “negative.” To translate, you
prepend translate English to German: and it outputs German text. To summarise, you prepend
summarize:. Because the output is always text, the same cross-entropy loss trains the model on every
task at once, and the task prefix tells the model what to do.
Encoder-decoder architecture
Unlike encoder-only BERT or decoder-only GPT, T5 uses the full encoder-decoder transformer from the original 2017 paper. The encoder reads and builds a rich representation of the input text, and the decoder generates the output text one token at a time while attending back to the encoder’s representation. This structure is a natural fit for the text-to-text view, since many tasks genuinely map a complete input sequence to a complete output sequence — exactly what an encoder-decoder is built to do.
Span-corruption pre-training and C4
T5 is pre-trained with a span-corruption objective: random contiguous spans of the input are replaced with sentinel tokens, and the model must reconstruct the missing spans as its output. This is a text-to-text version of masked language modelling that suits the encoder-decoder design. The training data is the C4 dataset — the Colossal Clean Crawled Corpus — a carefully filtered slice of Common Crawl with boilerplate and junk removed. Training on this large, clean corpus gave T5 broad language competence before any task-specific fine-tuning.
Why T5 mattered
T5’s lasting contribution was conceptual as much as technical. By showing that one consistent text-to-text format could cover the whole NLP landscape, it made multi-task learning and transfer clean and systematic, and it foreshadowed the instruction-following style that defines today’s chat models — where you simply describe a task in words and read the answer back as text. T5 and its instruction-tuned descendant FLAN-T5 remain practical, efficient choices for summarisation, translation, and structured generation, and the text-to-text mindset is now everywhere in modern AI.