RAG — Retrieval-Augmented Generation (AI Glossary)

Combining LLM generation with real-time retrieval for grounded, up-to-date answers

Ad placeholder (leaderboard)

Definition

RAG (Retrieval-Augmented Generation) is an architecture that combines an LLM with an external information-retrieval step. Instead of answering purely from what it memorised during training, the system first retrieves relevant documents from a knowledge source — your company wiki, a product manual, a database — and then generates an answer using that retrieved text as context. The result is an answer that is grounded in specific, up-to-date, verifiable sources rather than the model’s frozen and sometimes outdated internal knowledge.

Why RAG exists

LLMs have two well-known weaknesses: their knowledge is frozen at training time, and they confidently invent facts when they do not know something (hallucination). RAG addresses both. By feeding the model the actual source text at query time, you can answer questions about events after the training cutoff, about private documents the model never saw, and you can show where the answer came from. This makes RAG the dominant approach for enterprise search, customer support bots, and any application where accuracy and citations matter.

How it works: retriever and generator

A RAG system has two core components:

  • The retriever — usually an embedding model that converts both documents and the user’s question into vectors, plus a vector database that finds the document chunks whose vectors are most similar to the question. Documents are split into “chunks” ahead of time and indexed.
  • The generator — the LLM, which receives the user’s question together with the top retrieved chunks and writes a grounded answer, ideally quoting or citing the supplied material.

The flow is simply: embed the query, search the index, inject the best matches into the prompt, generate.

Naive RAG vs advanced RAG

The simplest implementation — naive RAG — embeds the query, retrieves the top-k chunks by similarity, and pastes them into the prompt. It works, but it struggles with vague queries, irrelevant retrievals, and contradictory chunks.

Advanced RAG layers in improvements: query rewriting to clarify ambiguous questions, hybrid search that blends keyword and vector matching, reranking that re-scores retrieved chunks with a more accurate model, and filtering to drop low-quality context before generation. These steps raise the relevance of what reaches the LLM, which is the single biggest driver of answer quality.

Why it matters

RAG is the pragmatic answer to “how do I make an LLM use my data?” It is cheaper and faster to update than fine-tuning, it supports citations and auditability, and it scales to large, frequently changing knowledge bases. Understanding RAG — and where its retrieval step can fail — is essential for anyone building reliable AI products on top of proprietary information.

Ad placeholder (rectangle)