The core idea
An autoencoder is a neural network that learns to reproduce its own input. That sounds pointless until you add one constraint: the data must flow through a narrow bottleneck layer with far fewer dimensions than the input. To rebuild the input accurately after squeezing it through that bottleneck, the network is forced to discover an efficient, compressed representation of the data. It does this without labels, learning purely from the structure of the inputs — a form of self-supervised learning.
Encoder, bottleneck, decoder
Every autoencoder has three parts. The encoder compresses the input into a small vector called the latent code or latent space. The bottleneck is that small middle layer — its limited size is what forces compression. The decoder then expands the latent code back into something as close as possible to the original input. Training minimises a reconstruction loss that measures how different the output is from the input. Over many examples, the encoder learns what to keep and the decoder learns how to rebuild from those essentials.
What the latent space captures
Because the bottleneck is so small, the latent space ends up encoding the data’s most meaningful factors of variation. For face images, those factors might correspond loosely to pose, lighting, or expression; for handwritten digits, to slant or thickness. This makes autoencoders useful for dimensionality reduction (a nonlinear cousin of PCA), anomaly detection (unusual inputs reconstruct poorly, producing high error), and denoising (train the network to reconstruct clean data from corrupted inputs).
Variational autoencoders and generation
A plain autoencoder is great at compression but poor at generation: pick a random point in its latent space and the decoder usually produces garbage, because the space has gaps and is not smoothly organised. A variational autoencoder (VAE) fixes this by having the encoder output a probability distribution rather than a single point, and by adding a regularisation term (the KL divergence) that pushes the latent space toward a smooth, well-behaved shape. Now you can sample a random latent vector and decode it into a plausible new output — turning the autoencoder into a generative model.
Why this matters for modern image AI
VAEs are a quiet workhorse behind today’s image generators. Latent diffusion models such as Stable Diffusion use a VAE to compress images into a much smaller latent space, run the expensive diffusion process in that compact space, and then decode the result back to a full-resolution image. This is what makes high-quality image generation affordable on consumer hardware. So while autoencoders look like a humble “copy the input” trick, the compressed representations they learn are foundational to compression, anomaly detection, and the generative systems people use every day.