What is latent space?
Latent space is the compressed, lower-dimensional space in which a model represents the essential features of its data. Instead of working directly with raw pixels, characters, or audio samples, a model maps each input to a vector — a list of numbers — that captures its meaning. The word latent means “hidden”: these dimensions are not directly observed, but learned by the model as the most useful way to encode the data.
The key property is structure. In a well-trained latent space, similar inputs land near each other and unrelated inputs land far apart, so geometric distance becomes a proxy for semantic similarity.
How models build a latent space
Several architectures explicitly create a latent space:
- Autoencoders squeeze input through a narrow “bottleneck” layer, forcing the network to keep only the most important features. That bottleneck is the latent space.
- Variational autoencoders (VAEs) add structure so the space is smooth and continuous, making it suitable for sampling new data.
- GANs learn a latent space from which a generator produces realistic outputs.
- Latent diffusion models (such as Stable Diffusion) run their denoising process inside a compressed latent space rather than at full pixel resolution, which is far cheaper than working with raw images.
Latent space vs embeddings
The two terms overlap. An embedding is a single vector — a point — while latent space usually refers to the entire learned representation space those points live in. People say “embedding” when they care about one item’s vector (for search or clustering) and “latent space” when they care about the structure of the whole space, especially the internal representation of a generative model.
Why interpolation matters
Because nearby points represent similar concepts, you can interpolate: pick two points and move smoothly between them. The decoded outputs blend smoothly too. This is the engine behind many creative effects:
- Morphing one face gradually into another,
- Style transfer by mixing the latent codes of content and style,
- Attribute editing, where moving along a particular direction in the space adds or removes a feature such as a smile, glasses, or a season.
Understanding latent space explains why generative models can do more than copy their training data: they have learned a continuous, navigable map of concepts, and generating new content is just choosing new coordinates on that map.