What Is a GAN? Generative Adversarial Networks Explained

A generator and a discriminator in an adversarial game — and why that works

Ad placeholder (leaderboard)

The basic idea: two networks in competition

A Generative Adversarial Network (GAN) is a generative model built from two neural networks locked in competition. The generator tries to produce realistic fake data — most famously images — while the discriminator acts as a critic, trying to distinguish the generator’s fakes from genuine examples drawn from a real dataset. The two are trained together: every time the discriminator gets better at catching fakes, the generator is pushed to make more convincing ones, and vice versa. This adversarial dynamic, introduced by Ian Goodfellow and colleagues in 2014, turned out to be a remarkably effective way to learn to synthesise data, and it dominated generative imaging for years before diffusion models arrived.

The min-max game

Formally, a GAN is a min-max game between the two networks. The discriminator tries to maximise its accuracy at labelling real data as real and generated data as fake. The generator tries to minimise the discriminator’s success — that is, to fool it into labelling fakes as real. They optimise the same objective in opposite directions, which is what makes the setup adversarial. At the theoretical equilibrium, the generator produces samples indistinguishable from real data and the discriminator is reduced to guessing. Reaching that equilibrium in practice, however, is far harder than the clean theory suggests.

Why training is unstable

The biggest practical headache with GANs is training instability. Because progress depends on keeping two competing networks in balance, the whole system is fragile. If the discriminator learns too fast, it rejects everything the generator makes and provides almost no useful gradient to learn from, stalling the generator. If the generator gets ahead, the discriminator can no longer offer meaningful guidance. This delicate balance makes GANs highly sensitive to architecture choices, learning rates, and other hyperparameters, and training runs can diverge or oscillate without converging. Much GAN research has been devoted to stabilising tricks, and the difficulty is a key reason the field has shifted toward more stable approaches like diffusion.

Mode collapse

A specific and notorious GAN failure is mode collapse. Instead of capturing the full variety of the training data, the generator discovers a small set of outputs that reliably fool the discriminator and then produces little else. A face generator might output convincing but nearly identical faces; a digit generator might only ever draw a few of the ten digits. The generator is exploiting a weakness rather than truly modelling the distribution. Mode collapse limits diversity and is one of the clearest contrasts with diffusion models, which tend to cover the full range of their training data far more reliably.

What GANs made possible

Despite their quirks, GANs produced landmark results. StyleGAN generated photorealistic human faces so convincing they powered viral demos like “This Person Does Not Exist.” GANs drove advances in super-resolution (sharpening low-quality images), image-to-image translation (turning sketches into photos or day scenes into night), style transfer, and early deepfakes. For the better part of a decade they were the default architecture for generative imaging. Today, diffusion models have largely overtaken them for general-purpose, text-driven generation, but GANs remain valuable where their single-pass speed and strength on narrow domains still give them an edge — and understanding them is essential to understanding how modern image generation evolved.

Ad placeholder (rectangle)