What Is Generative AI? Text, Images, Code, and Beyond

The class of AI models that create new content, not just classify existing data

Ad placeholder (leaderboard)

What “generative” means

Generative AI is the family of models that produce new content rather than simply analysing or labelling what already exists. Give a generative model a prompt — a sentence, an image, a few notes of melody — and it returns something original that fits the patterns it learned during training: a paragraph, a picture, a snippet of code, a voice clip. The key idea is that the model has learned the statistical structure of its training data so thoroughly that it can sample plausible new examples from that learned distribution. What comes out never existed before, even though it looks like the kind of thing the model was trained on.

Generative versus discriminative AI

The clearest way to understand generative AI is by contrast. A discriminative model draws boundaries: it decides whether an email is spam, whether an image contains a tumour, or which of ten digits a handwritten number is. It maps an input to a label. A generative model instead learns the data well enough to create more of it — to write the email, draw the image, or produce a new digit. Discriminative models answer “which category?”; generative models answer “what would a realistic new example look like?” Most of today’s headline AI tools are generative.

The output modalities

Generative AI now spans many kinds of content. Text and code come from large language models like GPT and Claude. Images are produced by diffusion-based systems such as Stable Diffusion, DALL-E, and Midjourney. Audio models generate speech, sound effects, and music. Video models like Sora and Runway create short clips from text prompts. Beyond media, generative methods produce 3D shapes, protein and molecular structures, and synthetic data for training other models. The frontier is multimodal systems that take in and generate several types at once — reading an image and answering in text, or turning a description into a narrated video.

The model families underneath

Different modalities favour different architectures. Autoregressive transformers generate output one token at a time, each token conditioned on everything before it; this powers virtually all modern text and code generation. Diffusion models work in the opposite direction — they start from pure noise and iteratively remove it, step by step, until a coherent image, audio clip, or video emerges, guided by the prompt. Earlier approaches still worth knowing include GANs (generative adversarial networks), where a generator and a critic compete, and variational autoencoders, which learn compact representations they can sample from. All of them rest on deep learning trained over massive datasets.

Why it matters

Generative AI matters because creating content used to be the part of work computers could not do. By turning generation into a learned, promptable capability, these models collapse the cost of drafting, designing, prototyping, and exploring options. That power also brings real risks — plausible-sounding misinformation, deepfakes, copyright and attribution questions, and the energy cost of training. Understanding generative AI starts with one simple distinction: it does not just recognise the world, it manufactures new pieces of it on demand.

Ad placeholder (rectangle)