Question 1

What is a transformer in AI?

Accepted Answer

A transformer is a neural network architecture, introduced in the 2017 paper 'Attention Is All You Need', that processes a whole sequence in parallel using self-attention. It is the foundation of every modern large language model, including GPT, Claude and Gemini.

Question 2

What is self-attention?

Accepted Answer

Self-attention lets each token in a sequence look at every other token and weight how relevant they are when building its own representation. This is how transformers capture long-range relationships between words without processing them one at a time.

Question 3

Why did transformers replace RNNs?

Accepted Answer

Recurrent networks process tokens sequentially, which is slow and struggles with long-range dependencies. Transformers process all tokens in parallel and attend directly across the whole sequence, making them far faster to train on modern GPUs and better at long-context understanding.

Question 4

What are the main components of a transformer block?

Accepted Answer

A transformer block contains a multi-head self-attention layer and a position-wise feed-forward network, each wrapped with a residual connection and layer normalisation. Positional encodings are added at the input so the model knows token order.

Transformer (AI Glossary)

Definition

Self-attention: the core idea

The components of a transformer block

Why it matters